Entries Tagged as ''

Best practices for HP EVA, vSphere 4 and Round Robin multi-pathing

The VMware vSphere and the HP EVA 4×00, 6×00 and 8×00 series are ALUA compliant. ALUA compliant means in simple words that it is not needed to manually identify preferred I/O paths between VMware ESX hosts and the storage controllers.

When you create a new Vdisk on the HP EVA the LUN is set default set to No Preference. The No Preference policy means the following:

– Controller ownership is non-deterministic. The unit ownership is alternated between controllers during initial presentation or when controllers are restarted
– On controller failover (owning controller fails), the units are owned by the surviving controller
– On controller failback (previous owning controller returns), the units remain on the surviving controller. No failback occurs unless explicitly triggered.

To get a good distribution between the controllers, the following VDisk policies can be used:

Path A-Failover/failback
– At presentation, the units are brought online to controller A
– On controller failover, the units are owned by the surviving controller B
– On controller failback, the units are brought online on controller A implicitly

Path B-Failover/failback
– At presentation, the units are brought online to controller B
– On controller failover, the units are owned by surviving controller A
– On controller failback, the units are brought online on controller B implicitly

In VMware vSphere the Most Recently Used (MRU) and Round Robin (RR) multi-pathing policies are ALUA compliant. Round Robin load balancing is now officially supported. These multi-path policies have the following characteristics:

– Will give preference to an optimal path to the LUN
– When all optimal paths are unavailable, it will use a non-optimal path
– When an optimal path becomes available, it will failover to the optimal
– Although each ESX server may use a different port through the optimal controller to the LUN, only a single controller port is used for LUN access per ESX server

Round Robin:
– Will queue I/O to LUNs on all ports of the owning controllers in a round robin fashion providing instant bandwidth improvement
– Will continue queuing I/O in a round robin fashion to optimal controller ports until none are available and will failover to the non-optimal paths
– Once an optimal path returns it will failback to it
– Can be configured to round robin I/O to all controller ports for a LUN by ignoring optimal path preference. (May be suitable for a write intensive environment due to increased controller port bandwidth)

The fixed multi-path policy is not ALUA compliant and therefore not recommend to use.

Another (HP) Best Practice is to set the IOPS (Default the IOPS value is 1000) to a value of 1 for every LUN by using the following command:

for i in `ls /vmfs/devices/disks/ | grep naa.600` ;
do esxcli nmp roundrobin setconfig –type “iops” –iops=1 –device $i ;done

There is a bug when rebooting the VMware ESX server, the IOPS value reverted to a random value.
To check the IOPS values on all LUNs use the following command:

for i in `ls /vmfs/devices/disks/ | grep naa.600` ;
do esxcli nmp roundrobin getconfig –device $i ;done

To solve this IOPS bug, edit the /etc/rc.local file on every VMware ESX host and and add the IOPS=1 command. The rc.local file execute after all init scripts are executed.

The reason for the IOPS=1 recommendation is because during lab tests within HP this setting showed a nice even distribution of IOs through all EVA ports used. If you experiment with this you can see the queue depth for all EVA ports used very much even and also throughput through the various ports. Additionally with the workloads there is a noticeable better overall performance with this setting.

Best practices are not a “one recommendation fits all” cases.

VAAI support with SAN/iQ 9.0 on P4000

SAN/iQ vor P4000 has now support fot VMware VAAI vStorage offloads — Full copy, Block Zeroing, and Hardware Assisted Locking for faster VM deployment and less load on ESX server.

Meaning more VM’s can be run on the same environment. Till SAN/iQ 8.1 is was recommended to a total of 16 VM’s, now up t 50 VM’s should be supported. I have not been able to test this theorie, but this is stated by HP at this moment.

To upgrade to SAN/iQ 9.0 your system should run SAN/iQ 8.1.

SQL on vSphere link collection

VMware released some whitepapers about running MS SQL Server on VMware vSphere.

The VMware communities site has a spot called VIOPS where you can see posts from members in the community about recently posted documents. There are some nice documents available about virtualizing SQL server 2008 on vSphere 4.1.

There are a couple on SQL Server, for each product a “best practices document” and an “availability and recovery options” document.

Microsoft SQL Server on VMware – Best Practices Guide

Microsoft SQL Server on VMware – Availability and Recovery Options

Performance and Scalability of Microsoft® SQL Server® on VMware vSphere™ 4

Design, Deploy & Optimize SQL Server on vSphere

Disk alignment

Disk alignment is very important in every Operating system environment. But when you are using a SAN and also using VMware you should take disk alignement in account.

Above you see that the Guest OS (for example Windows) is not aligned with the VMFS and the VMFS is not aligned with the Array Blocks. Meaning that 1 I/O can result into 3 I/O’s on the storage device.

Now the VMFS has been aligned, but the Guest OS is still not. Now an I/O can result into 2 I/O’s on the storage device. Beter, but performance can still be improved.

Now all File Systems are aligned. On I/O results into 1 I/O, because all beginning of the blocks are at the same position.

Only Windows Vista, Windows 2008 and Windows XP has a trick to avoid this:

UseLunReset and UseDeviceReset VMware

If you are using a SAN attached ESX environment, make  sure:
Disk.UseLunReset is set to 1 (default = 1)
Disk.UseDeviceReset is set to 0 (default = 1).

The reason to disable the Disk.UseDeviceReset param is because it does a complete SCSI bus reset. All SCSI reservations will be cleared, not for a a specific LUN but for the complete device (being the whole SAN controller).

This could disrupt your SAN fabric. I would suggest setting the ESX host in maintenance mode and reboot it afterwards.

Alternatively, you can also set this via the Service Console by issuing the following commands:

esxcfg-advcfg -s 1 /Disk/UseLunReset
esxcfg-advcfg -s 0 /Disk/UseDeviceReset
service mgmt-vmware restart

Tuning ESX(i) for better storage performance

Many applications are designed to issue large I/O requests for higher bandwidth. ESX/ESXi 3.5 and ESX/ESXi 4.x support increased limits for the maximum I/O request size passed to storage devices. These versions of ESX pass I/O requests as large as 32MB directly to the storage device. I/O requests larger than this are split into several, smaller-sized I/O requests.

Some storage devices, like the EVA, have been found to exhibit reduced performance when passed large I/O requests (above 128KB). As a fix for this, you must lower the maximum I/O size ESX allows before splitting I/O requests.

To reduce the size of I/O requests passed to the storage device using the VMware Infrastructure/ vSphere Client:

  1. Go to Host > Configuration.
  2. Click Advanced Settings.
  3. Go to Disk.
  4. Select Disk.DiskMaxIOSize.
  5. Change value to 128.

Advanced Settings for VMware HA

There are some Advanced Settings that you can change from within your VMware HA Cluster.

By default the Service Console portgroup is used for failure detection. By entering in das.AllowNetwork you can specify an additional portgroup to use.

By default VMware HA nodes check heartbeats of other nodes . When this heartbeat is lost the suspected Isolated nodes then pings its Service Console gateway by default to check that it is truly isolated. By using the das.isolationAddress command, you can add additional IP addresses for the server to check. These IPs must be on the Service Console portgroup, or in the portgroup you’ve added for das.AllowNetwork

This is the time required before VMware HA considers a host to be Isolated.Default setting is 14 seconds, this can be changed to a setting that will better fit your needs.

This is the rate of monitoring between VMware HA nodes, think of this as a heartbeat check.
Default setting is 1 second, if you feel that this is just too often you can change it.


Call Of Duty 4, move profile

I’m going to assume that you have COD4 installed on your C: drive.

Navigate to:

C:\Activision\Call of Duty 4 – Modern Warfare\players\profiles\(YOUR PROFILE NAME HERE)

You will then see two files.

config_mp.cfg: In here are your graphics and texture settings.
mpdata: This is where your gun and online status are saved.

Save both files. Reformat, reinstall COD4 and just move the files back in there.

NOTE: Keep in mind that each mpdata only works when you have the same serial key as before.

Dangerous default “bug” on ESX 4, regarding Ctrl-Alt-Del.

Be alerted on a default setting on ESX 4, which is potentially dangerous these days.

If you hit Ctrl-Alt-Del on an ESX 4 console, it will reboot the server even if there are running VMs and it doesn’t care if the server is not in Maintenance Mode. Even if your not logged on it will capture the Ctrl-Alt-Del and reboot.

This is an old throwback which most modern Linux distribution disable these days.

To disable this yourself, open up /etc/inittab in your favourite editor and comment out the “ca::ctrlaltdel:/sbin/shutdown -t3 -r now” line with a # symbol so it looks like this:

# ca::ctrlaltdel:/sbin/shutdown -t3 -r now

Save and exit the file. For this to take effect without a reboot, then run: init q This certainly disabled by default on ESX 3.5 hosts, so I assume that this was an oversight on VMware’s part on the new release.


Basic settings Brocade Switches

Settings for EVA[468][10]00 storage systems:

aptpolicy 1

Settings for EVA[468][4]00 storage systems with HP SCSI CA configured:

aptpolicy 3

The basics:

Change Fabric settings (domain ID == same als Switch name)