View 3.1 USB Redirection Improvements

As you probably have seen, View 3.1 GA’d yesterday.  One of the improvements listed in the release notes was:

  • USB Improvements – View 3.1 offers more reliable and broader device support with reduced bandwidth consumption. A separate TCP/IP stream is used.

From what I understand in talking to some people is that a lot of time was spent on the USB redirection stack to further optimize and tune it.

ALSO, USB redirection traffic is now split out onto it’s own traffic stream.  USB redirection traffic will now communicate from the client to the host vm on TCP port 32111.  I imagine this opens up a few new opportunities to do some USB specific traffic prioritization/trottling.  Very interesting!   In previous versions, the USB traffic was inside of the RDP stream (virtual channel).  This prevented us from ever REALLY seeing the USB specific traffic or having any control over it.  Simply put, now we do.  Gotta love progress!

Scripted ESX Installation: Reconfiguring COS Networking with Kickstart

Frequently customers have specific NICs (like onboard NICs) that they’d like assigned to the COS, leaving the other NICs for VM traffic.  This is difficult, however, when using our automated kickstart deployment scripts as there is no way to explicitly define the vmnic assigned to the COS.  And to make matters worse, the VMkernel is not yet available to us during the %post section of the kickstart script, which makes COS networking configuration difficult! Recently I had a customer who was getting frustrated because …

  1. They would “rack and stack” a physical server and wire up their NICs accordingly (i.e. onboard NICs on the management VLAN, remaining NICs on production VLANs)
  2. PXE boot the server
  3. When kickstart completed, they’d lose connection to the COS.

This happens because during installation, ESX just assigns vmnic0 to the lowest PCI number, and then assigns vmnic0 to the COS. And this is often not the NIC the admin wants used for their COS. Of course, they could go back after the fact and reconfigure the COS networking, but this kind of defeats the purpose of a completely hands-free, automated deployment.

Here is one possible solution to the problem.  Below is a script I wrote to append to the %post section of a kickstart file.  Obviously, you’ll need to make modifications for your environment.

## This script should be appended to the %post section of an ESX kickstart file.
## For more info on kickstart and scripted ESX installations, see Appendix B of
## http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_installation_guide.pdf

##
##
Essentially, this is a “script that creates a script.” Because the VMkernel is
## not yet available to us during the %post section of the scripted install, we use
## %post to generate a script called /tmp/post_esx_install.sh that will launch via
## rc.local upon first boot (and only first boot).
##
## The post_esx_install.sh will first make a backup copy of esx.conf and then
## reconfigure the COS networking.  Please see the in-line comments below for
## tweaking post_esx_install.sh for your environment.
##
## If you have any questions, please email aaron [at] sweemer [dot] com.

%post

cat > /tmp/esx_post_install.sh << EOF
#!/bin/bash
cp /etc/vmware/esx.conf /etc/vmware/esx.conf.backup
/usr/sbin/esxcfg-vswitch -U vmnic0 vSwitch0
/usr/sbin/esxcfg-vswif -d vswif0

## If your kickstart file has vmportgroup=1, you *might* want to uncomment the
## next line

## /usr/sbin/esxcfg-vswitch -D “VM Network”

/usr/sbin/esxcfg-vswitch -A “VMkernel” vSwitch0

## You’ll need to find which physical NICs you want assigned to your COS.  From
## the command line of an already installed ESX server, execute
## “/usr/sbin/esxcfg-nics -l” as root and look for something unique about the
## NICs.  For example, this could be the word “Broadcom” or it could be the
## actual PCI number.  In the next line, replace “search term” with this
## text.

/usr/sbin/esxcfg-nics -l | awk ‘\$0 ~ /search term/ {print \$1}’ | xargs –n 1 /usr/sbin/esxcfg-vswitch vSwitch0 –L

## Note: if you want to test the line above from the command-line, you’ll need
## to remove the leading “\” in front of $0 and $1. The \’s need to be here so
## the esx_post_install.sh script gets properly written by kickstart. But when
## executing directly on a command line, the \’s need to be removed.

## Replace the x.x.x.x after -i with the IP address and after -n with the
## subnet mask for your COS.

/usr/sbin/esxcfg-vswif -a vswif0 -p “Service Console” -i x.x.x.x  -n x.x.x.x

## Replace the x.x.x.x after -i with the IP address and after -n with the subnet
## mask for your VMkernel port group.

/usr/sbin/esxcfg-vmknic -a -i x.x.x.x -n x.x.x.x VMkernel

## Replace x.x.x.x with the default gateway for the COS in both of the next two lines.
route add default gw x.x.x.x
echo “GATEWAY=x.x.x.x” >> /etc/sysconfig/network

mv /etc/rc.d/rc.local.save /etc/rc.d/rc.local
EOF

chmod +x /tmp/esx_post_install.sh
cp /etc/rc.d/rc.local /etc/rc.d/rc.local.save

cat >> /etc/rc.d/rc.local << EOF
cd /tmp/
/tmp/esx_post_install.sh
EOF

As an example, in my environment I have server with 4 NICs and by default, ESX assigns vmnic0, which is mapped to PCI 02:00.00, to the service console. However, what is actually physically wired to my management network is vmnic3, which is mapped to PCI 02:03.00.  In the script above, I simply searched for the number 3 (i.e. replaced search term with 3) and now my scripted ESX installation works properly.

Below is the configuration of my server before I redeployed with kickstart.  The line in red is the NIC I want assigned to the COS.  The lines in black are what ESX assigns the COS by default.

BEFORE (without %post section)


[root@vesx7 root]# esxcfg-nics -l
Name    PCI      Driver      Link Speed    Duplex MTU    Description
vmnic1  02:01.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM

vmnic2  02:02.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic3  02:03.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic0  02:00.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM


[root@vesx7 root]# esxcfg-vswitch -l
Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0       64          4           64                1500    vmnic0

PortGroup Name      VLAN ID  Used Ports  Uplinks
VM Network          0        0           vmnic0

Service Console     0        1           vmnic0

Now, here is the same output after I redeployed the server with my modifications to the %post section of the kickstart file. The scripted deployment of ESX now properly assigns vmnic3 to my service console.

AFTER (with %post section)

[root@vesx7 root]# esxcfg-nics -l
Name    PCI      Driver      Link Speed    Duplex MTU    Description
vmnic1  02:01.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic2  02:02.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic0  02:00.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic3  02:03.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM

[root@vesx7 root]# esxcfg-vswitch -l
Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0       64          5           64                1500    vmnic3

PortGroup Name      VLAN ID  Used Ports  Uplinks
Production          0        0           vmnic3

Service Console     0        1           vmnic3

I hope this was helpful.  Let me know if you have any questions.

Well, I’d better sign off and start packing because I leave for Omaha, NE in a few hours.



VCDX Admin Exam Notes – Section 1.4

I’m trying to get these out faster.  But I’m finding it’s a pain to compile, tweak and reformat my notes so they make sense and look right on the blog.  Here’s the next in the series.

Objective 1.4 – Implement and manage Storage VMotion.

Knowledge

Describe Storage VMotion operation

I like to think of Storage VMotion as the “inverse” of VMotion.  Instead of moving (live) the front end (i.e. CPU, memory, and network) from one physical device to another, Storage VMotion moves (live) the back end (i.e. disk) from one physical device to another.

The following was taken directly from the VMware.com website and acurately describes how a Storage VMotion works.

 

image

  1. Before moving disk files, Storage VMotion creates a new virtual machine home directory for the virtual machine in the destination datastore.

  2. Next, a new instance of the virtual machine is created. Its configuration is kept in the new datastore.
  3. Storage VMotion then creates a child disk for each virtual machine disk that is being moved to capture a copy of write activity, while the parent disk is in read only mode.
  4. The original parent disk is copied to the new storage location.
  5. The child disk is re-parented to the newly copied parent disk in the new location.
  6. When the transfer to the new copy of the virtual machine is completed, and the original instance is shut down. Then, the original virtual machine home is deleted from VMware vStorage VMFS at the source location.

Explain implementation process for Storage VMotion

For ESX 3.5, Storage VMotion is a command-line only implementation.  There are third party GUI plugins to the VI client that I have used and work really well, but they are of course not supported by VMware.  And for the purposes of this post, I would imagine the VCDX exam will stick to VMware only supported implementations.

To execute a Storage VMotion, you’ll need the RCLI (Remote Command Line Interface) from VMware.com.  There is an RCLI for both Windows and Linux, and there’s an RCLI virtual appliance too.  Take your pick, and then be sure to review the Remote Command-Line Interface Installation and Reference Guide for more info on RCLI.  But specifically for Storage VMotion, the RCLI command comes in two flavors (from the guide):

 

To use the command in interactive mode, type svmotion –interactive. You are
prompted for all the information necessary to complete the storage migration.
When you invoke the command in interactive mode, all other parameters are
ignored.

In noninteractive mode, the svmotion command uses the following syntax:

svmotion [standard Remote CLI options] –datacenter=<datacenter name>
–vm <VM config datastore path>:<new datastore>
[–disks <virtual disk datastore path>:<new datastore>,
<virtual disk datastore path>:<new datastore>]

 

    Identify Storage VMotion use cases

There are a number of reasons that I can think of where  you’d want to use Storage VMotion.  Some of the more obvious reasons (at least to me, anyway) would be:

  1. Array maintenance
  2. Migrating to newer or different (i.e. FC, iSCSI, NAS) hardware
  3. Adding new storage
  4. To achieve optimal distribution of storage consumption across LUNs
  5. To resolve storage bottlenecks and other performance issues

I’m sure if I thought about it some more, I could come up with a few more.  But I think this is a decent list.

Understand performance implications for Storage VMotion

There are a couple things that occur during a Storage VMotion that could affect performance, and therefore need to be considered when planning to move storage. 

  1. During the Storage VMotion, the VM actually does a “Self VMotion,” meaning the VM is VMotioned to the same host it’s already running on (review step #2 in the graphic above).  And therefore during this time, there is temporarily twice the amount of memory consumed.
  2. During the move, extra disk space is required on the source volume while all disk writes are redirected to the snapshot disk.
  3. All disk I/0 for the copy (i.e. read from the source volume, write to the snapshot disk and write to the destination volume) is going through the VMkernel of the host.

 

Skills and Abilities

Use Remote CLI to perform Storage VMotion operations

  • Interactive mode
    This is pretty easy.  Just enter svmotion –interactive at the command line and then follow the prompts.
  • Non-interactive mode
    In my environment, the command
    to move all the virtual disks associated with aaron-corp-xp (the name of an actual VM I use) from vol1 to vol2, would look like this

svmotion –url=https://10.10.8.60/sdk –username=aaron –password=<yeah right> –datacenter=cincylab –vm=’[vol1] aaron-corp-xp/aaron-corp-xp.vmx: vol2’

VCDX Admin Exam Notes – Section 1.3

Ugh.  My brain hurts.  I’ve spent the past few hours reviewing scripted ESX installations and working on a PowerShell script for a customer that will reorder vmnics after a scripted installation is complete (because I can’t find any other way to force their order during the install).  It’s been a few months since I’ve done a scripted installation, so I definitely needed a refresher.  Plus, according to the VMware Enterprise Administration Exam Blueprint v3.5, section 8.1 is all about automating ESX deployments.  The good news is that section 8.1 is the last section of the blueprint, so I believe I’m almost done preparing for the VCDX Admin Exam, which I’m scheduled to take in a few days. 

Anyway, going back to the beginning of the Blueprint, and continuing from where I left off, here is the next section of my study notes.

Objective 1.3 – Troubleshoot Virtual Infrastructure storage components.

Knowledge

Identify storage related events and log entries.  Analyze storage events to determine related issues.

All storage related events will be recorded in the /var/log/vmkernel log file.  Most of the messages in this log file are fairly cryptic and can be difficult to interpret.  Furthermore, this log file contains all messages from the vmkernel, not just storage related messages, so you’ll have to filter through it.  An easy way to do this is simply to search for SCSI.  For example, the command cat /var/log/vmkernel | grep SCSI on one of my servers produces the following output (only showing the last 10 lines) …

[root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -10
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1069)iSCSI: login phase for session 0x8603f90 (rx 1071, tx 1070) timed out at 23051576, timeout was set for 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0x8603f90 connect timed out at 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)<5>iSCSI: session 0x8603f90 iSCSI: session 0x8603f90 retrying all the portals again, since the portal list got exhausted
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0x8603f90 to iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1 waiting 1 seconds before next login attempt
May 18 12:57:48 cincylab-esx3 vmkernel: 2:16:01:56.748 cpu2:1071)iSCSI: bus 0 target 0 trying to establish session 0x8603f90 to portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:00 cincylab-esx3 vmkernel: 2:16:02:09.355 cpu2:1071)iSCSI: bus 0 target 0 established session 0x8603f90 #3, portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba35:C0:T0:L0 : 0×0 0×80 0×83 
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba35:C0:T0:L0: 0×1 0×1 0×0 0×18 0×42 0×55 0×46 0×46 0×41 0x4c 0x4f 0×0 0×0 0×0 0×0 0×0 0×1 0×0 0×0 0×0 0×0 0×0 0×0 0×0 0×2 0×0 0×0 0×0 
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba35:C0:T0:L0 0×20 0×20 0×20 0×20 0×56 0×49 0×52 0×54 0×55 0×41 
[root@cincylab-esx3 root]#

If you look closely, I clearly had some issues with my iSCSI appliance a few hours ago.  I decided make some configuration changes to the switch and then, all of a sudden, the ESX server lost connectivity to its storage.  Weird! :)

Anyway, what does all this mean?  There’s an really good VMworld Europe 2008 presentation (which you can get from www.vmworld.com) titled VI3 Advanced Log Analysis, which goes into detail about how to interpret VMware log files.  From that presentation, I found this diagram which describes the components of a message in the vmkernel log file.

image

 

Skills and Abilities

Verify storage configuration and troubleshoot storage connection issues using CLI , VI Client and logs

  • Rescan events
    A rescan event can be initiated with the esxcfg-rescan at the command line.  The output should look like the following …
  • [root@cincylab-esx3 root]# esxcfg-rescan vmhba32
    Rescanning vmhba32 …
    On scsi1, removing: 0:0.
    On scsi1, adding: 0:0.
    Done.
    [root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -3
    May 18 19:13:09 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T0:L0 : 0×0 0×80 0×83 
    May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T0:L0: 0×2 0×0 0×0 0×18 0x4c 0×69 0x6e 0×75 0×78 0×20 0×41 0×54 0×41 0x2d 0×53 0×43 0×53 0×49 0×20 0×73 0×69 0x6d 0×75 0x
    May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T0:L0 0×36 0×52 0×58 0×36 0x4a 0×39 0×39 0×58 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×47 0×42 0×30 0×31 0×36 0x
    [root@cincylab-esx3 root]#


  • Failover events
    I don’t have redundant paths in my lab to simulate this.  So, again from the VMworld presentation, VI3 Advanced Log Analysis, here is a screen shot from the slide that covers this topic.

image

There will obviously be a lot of different types of error and event messages in /var/log/vmkernel.  And I’m certainly not going to try and list every possible combination here.  I highly suggest you download the VMworld preso because it does a great job of explaining how to further decipher the log files (like defining SCSI error codes). 

Well, that’s about it for this section.  Back to PowerShell scripting for another hour or so.

VCDX Admin Exam Notes – Section 1.2

Last week I was in San Francisco with most (maybe all) of the VMware field technical folks for a three day technical summit.  One of the evenings we had an awards ceremony and dinner.  And guess what?  The first eight VCDX certifications ever to be awarded were announced.

Now, VMware is a pretty big company, so I didn’t recognize seven of the eight names.  But I definitely recognized one of them.  Well, that is, I should say I recognized his name.  Having never officially met him fact to face, I couldn’t pick him out of a crowd of two.  You might know him as the rock star blogger from Yellow Bricks.  Congratulations Duncan Epping!  I believe he said he is VCDX number seven and the first VCDX in Europe.  Very cool.  And well deserved, for sure.  :)

I’m a little behind Duncan.  I’m scheduled to take the Admin Exam later this month, which is the first of two exams.  Then I’ll have to present and defend a successful design and deployment before a jury … er, I mean, panel of my peers.

Anyway, here are my notes for section 1.2 of the VMware Enterprise Administration Exam Blueprint v3.5. (Section 1.1 can be found here).  Everything in Blue is a direct cut and paste from the exam blueprint.

Objective 1.2 – Implement and manage complex data security and replication
configurations.

Knowledge

Describe methods to secure access to virtual disks and related storage devices

  • Distributed Lock Handling

    vmfs_dfl
    In the graphic below, notice how each ESX server sees and has access to the same LUN? This is achieved via VMFS, a clustered file system which leverages distributed file locking to allow multiple hosts to access the same storage.  When a Virtual Machine is powered on, VMFS places a lock on its files, ensuring no other ESX server can access them.

Identify tools and steps necessary to manage replicated VMFS volumes

  • Resignaturing
    First, there’s a really good article on VMFS resignaturing by Duncan (go figure).  Also, Chad Sakac over at Virtual Geek has a great article too.  I’m not going to reinvent the wheel, so make sure you read their posts.  You’ll need to understand this.  For the exam, you’ll certainly need to know the following …

    The following is from the Fibre Channel SAN Configuration Guide:

EnableResignature=0, DisallowSnapshotLUN=1 (default)
In this state:

  • You cannot bring snapshots or replicas of VMFS volumes by the array into the ESX Server host regardless of whether or not the ESX Server has access to the original LUN.
  • LUNs formatted with VMFS must have the same ID for each ESX Server host.

EnableResignature=1, (DisallowSnapshotLUN is not relevant)
In this state, you can safely bring snapshots or replicas of VMFS volumes into the same servers as the original and they are automatically resignatured.

    EnableResignature=0, DisallowSnapshotLUN=0
    This is similar to ESX 2.x behavior.  In this state, the ESX Server assumes that it sees only one replica or snapshot of a given LUN and never tries to resignature. This is ideal in a DR scenario where you are bringing a replica of a LUN to a new cluster of ESX Servers, possibly on another site that does not have access to the source LUN. In such a case, the ESX Server uses the replica as if it is the original.

Do not use this setting if you are bringing snapshots or replicas of a LUN into a server
with access to the original LUN. This can have destructive results including:

  • If you create snapshots of a VMFS volume one or more times and dynamically
    bring one or more of those snapshots into an ESX Server, only the first copy is
    usable. The usable copy is most likely the primary copy. After reboot, it is
    impossible to determine which volume (the source or one of the snapshots) is
    usable. This nondeterministic behavior is dangerous.
  • If you create a snapshot of a spanned VMFS volume, an ESX Server host might
    reassemble the volume from fragments that belong to different snapshots. This can
    corrupt your file system.

Skills and Abilities

Configure storage network segmentation

  • FC Zoning
    Zoning delivers access control in the SAN, restricting visibility to devices in the zone solely to other members of that zone.  It is a common technique used to do things like group ESX servers into production/test/dev, increase security and decrease traffic, among other things.
  • iSCSI/NFS VLAN
    Storage segmentation for IP storage can be accomplished in one of two ways:  VLANs or physical segmentation (i.e. separate layer 2 switches for storage).

Configure LUN masking

The Disk.MaskLUNs parameter should be used when you’re trying to mask specific LUNs to your ESX host.  This is a useful option when you don’t want  your ESX server to access a particular LUN, but are unwilling (or unable) to configure your FC switch.

To configure LUN masking in the VI Client go to Configuration –> Advanced Settings for the host you want to configure. You’ll find the Disk.MaskLUNs parameter under the section Disk.  It looks like this in my VI Client.

disk.maskluns

Enter a value in the following format … <adapter>:<target>:<comma separated LUN range list>. Be sure to rescan when your done and verify the Mask has been properly applied.

Use esxcfg-advcfg
This one’s easy.  Just use the man page (type “man esxcfg-advcfg” at the command prompt).  It’ll tell you everything you need to know :)

Set Resignaturing and Snapshot LUN options
So, following along with the man page above, here is a cut and paste from my server …


[asweemer@cincylab-esx3 config]$ su -
Password:
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/EnableResignature
Value of EnableResignature is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/EnableResignature
Value of EnableResignature is 1
[root@cincylab-esx3 root]#
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 1
[root@cincylab-esx3 root]#


Manage RDMs in a replicated environment
RDMs can be created via the CLI with the following command …

vmkfstools -r /vmfs/devices/disks/vmhbaX:Y:Z:0 my-vm.vmdk

By default, the RDM will be created in Virtual Compatibility Mode.  But should you need and/or prefer Physical Compatibility Mode, you can change this by editing the VMDK file and changing the createType value to vmfsPassthroughRawDeviceMap.

Use proc nodes to identify driver configuration and options
The proc filesystem is a pseudo filesystem, it’s not “real.”  It consumes no storage space and is used to access process information from the kernel.  You’ll find quite a bit of valuable data and configuration options in the many subdirectories of /proc/vmware/config.  Here’s a quick example from my ESX server …


[asweemer@cincylab-esx3 LVM]$ pwd
/proc/vmware/config/LVM
[asweemer@cincylab-esx3 LVM]$ ls
DisallowSnapshotLun  EnableResignature
[asweemer@cincylab-esx3 LVM]$ cat EnableResignature
EnableResignature (Enable Volume Resignaturing) [0-1: default = 0]: 0
[asweemer@cincylab-esx3 LVM]$


Use esxcfg-module

Just like esxcfg-adv, use the man page.


Thinking about upgrading to vSphere 4? It’s a no brainer.

Since I was in a meeting during the launch of vSphere 4 on April 22nd, and since I found myself wide awake at 3AM, I decided to to watch the recording of the webcast early this morning. And as I was watching, I heard Steve Herrod (VMware CTO) make the following statement …

… So if you’re an existing customer today and you have a 100 host deployment using our vi3.5 product, simply upgrading the software will save you $2 Million dollars a year …

Wow, that’s pretty powerful.  In an age when words like costly, frustration, and BSOD’s (Blue Screen of Death) are often associated with software upgrades, it’s no wonder many companies are taking an “if it ain’t broke, don’t fix it” approach.  But here’s a software upgrade that, if for no other reason, should be considered purely for economic reasons.

How can VMware make such a bold claim?  Steve’s statement was based on the following efficiencies you’ll achieve with vSphere4 (over and above what you’re already seeing with VI3):

30% Greater Consolidation

Most people know that you’ll get the greatest VM density with VMware due to superior technologies like memory over commitment and Distributed Resource Scheduling.  But did you know that VM density is a critical metric when determining TCO?  There’s a great blog post over at VMware:  Virtual Reality which goes into detail.  But the following graphic sums it up pretty well …

cost_per_vm

Simply put, with VMware you’ll have less physical servers to buy, less network and storage connections, less floor space and less power and cooling to support your virtual infrastructure … AND you’ll have superior functionality like VMotion, DRS, HA, etc.

If you’re an existing VMware customer, then you are already benefiting from the efficiencies afforded to you by VI3.  And upgrading to vSphere4 is going to give you even greater efficiencies, allowing you to achieve an even greater VM density, as well as capture a greater number of high I/O applications that were previously considered non VM candidates.  The following table taken from the webcast summarizes these performance improvements in vSphere 4.

vsphere_performance

50% Storage Savings

There are over 150 new features in vSphere 4.  One of the more exciting features is Thin Provisioning.  This feature is already included VMware’s virtual desktop offering, VMware View, and I have a blog post about storage savings with View if you’re looking for more technical detail.  But for this post, know that the technology has been applied to vSphere 4 and allows for significant storage savings.

Basically, Thin Provisioning allows for the VM to consume no more space than the data requires.  So, for example, if you have VM with a 100G virtual drive but only 20G of data within the virtual drive, then only 20G will actually be consumed.  When applied across all your VMs, you’ll achieve economies of scale and you’ll likely see a 50% reduction in storage, if not more.

20% Power Savings with Distributed Power Management

What is Distributed Power management (DPM)?  Steve Harrod calls it “VM Tetris” or “Server Defrag,” which I thought was clever.  During low server utilization, DPM will intelligently VMotion workloads down to the smallest number of acceptable physical servers and then power off the unused servers.  As traffic increases during peak hours, DPM will power on the servers and again redistribute the workloads with VMotion.

Distributed Power Management isn’t new as it was introduced over a year ago in VI3.  However, up until vSphere 4, this feature  was only experimentally supported.  And with the lack of full support in VI3, I don’t believe many customers actually used DPM.  But VMware supports DPM in vSphere 4, assuming your hardware has IPMI, WOL or iLO.  And it can deliver significant savings in you power and cooling costs.  Plus you get the added bonus of doing your part to save the environment.

Here is an awesome video some of the VMware engineers created showing DPM in action …

At this point, you’re probably asking the following questions …

  • Sounds great, but how much is it going to cost me?  Nothing.  Your software maintenance covers like-for-like upgrades.  So, if you have VI3 Enterprise, then you can upgrade to vSphere 4 Enterprise at no additional cost.
  • Is it a difficult process to upgrade?  Will it require massive configuration changes?  Nope.  The upgrade is actually rather simple.  I upgraded my three lab VI3 servers to vSphere 4 in under an hour with no downtime of any of my VMs.  Basically, Update Manager handled just about everything for me.
  • Do I get anything else with vSphere?  Heck yeah!  Remember, there are 150 new features in vSphere 4, which I’m sure I’ll address in future posts.  I only addressed the ones that will save you money.

So let’s see if I can summarize this properly … a zero cost, easy upgrade = 30% Greater Consolidation + 50% Storage Savings + 20% Power Savings.  To me, that’s a no brainer.  What other software company in the world offers that kind of value?