Managing Virtual Infrastructure

VMware vSphere Capacity IQ Overview – I’m Impressed!

ciq-icon

With the launch of VMware vSphere came some new products that I hadn’t really paid much attention to (busy upgrading I guess).  One of the newer products is a Virtual Center reporting tool called Capacity IQ.  This product  gives an administrator the ability to analyze, forecast and plan for future growth across your ESX environment.  I have had a lot of experience with monitoring/reporting tools in the past, I won’t bore you with the details, so I was quite skeptical of a 1.0 reporting tool for Virtual Center.  I must admit I was blown away by the immediate relevant reports the product was able to produce.

After pulling down the trial install and obtaining the demo key, I loaded it up for a spin.  I am not going to document the installation steps needed as Eric Gray has done this for us already.  It by far is the easiest reporting application I have ever installed.  If your interested in taking it for a trial run, download the virtual appliance from VMware’s website here (OVF format).  Once you import the virtual appliance and give it a static IP address, it will need to collect data about your environment for a while.

There are three basic views that CIQ gives you once you install the plug-in, dashboard, views and reports.

Dashboard

The dashboard tab is designed to give you a quick overview of the item you have selected.  Capacity IQ uses the same approach as virtual center does, whatever object you have selected will be reported and focused on.  Here is a view of one of our clusters, notice January 11th on the Trend and Forecast graph on top.

Dashboard

One of our clusters was out of resources, I added two more physical hosts to the cluster.  You can see CIQ picks up the new physical host resources for the cluster and reflects this by increasing the number of virtual machines it believes the cluster can accommodate.  Want to see something even more interesting, check out the pink graph on the 17th.  Capacity IQ is already using a prebuilt formula to assume what it thinks we will have (or won’t have) a week out.  Pretty impressive.

Views

The views tab is designed  to give you a more detailed look on some of the specific data points.  Here is a screenshot of the various reports you can execute:

Views

So here is where you can get some great visual reports to present to either upper management, or a potential customer.  This gives you a nice interface that you can customize with data points that you can tweak.  Check out the first report on this cluster:

image

This gives you a graphical historical view of your cluster, how many virtual machines you have added over the course of time.  Notice the horizontal sliding bar at the bottom of the chart.  This allows you to adjust your variable time/date window.  The lighter shaded line to the right is the projected or forecasted growth of how the cluster might continue to grow.  The views tab is a great place to run some ad-hoc reports, gives you the ability to select the type of report, and even allows you to export the data.

Reports

The reports tab is the “pre-canned” reports that can be executed by the administrator.  The one thing I was disappointed to not see here was the ability to schedule these reports to run at a particular interval (weekly/monthly).  This is something that I assume will probably be introduced in future releases of the product.

Reports

After the report is executed and compiled, you are then provided with a .pdf or .csv version of your dataset to download and review.  The first report totaled 17 pages and provided some great technical information.  Here is the table of contents:

image

Conclusion

I am very impressed with Capacity IQ.  There are no agents you need to install across the virtual machines you wish to report against.  The installation was very straight forward, I think I had it up and running in about 15 minutes.  Once the virtual appliance was in place, all it needed was a little bit of time to start crunching some data.  The reports are well written and very relevant to what an administrator would desire and wish to see.  If your looking for a nice reporting tool to help you forecast, give this one a test to see if it fits your needs.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Mass upgrade of VMware Tools in Linux guests

linuximage

Installing and/or upgrading VMware tools has always been a bit more complicated for Linux guests than for Windows guests.  After the installation of the package binaries, the vmware-config-tools.pl script must be run to configure the tools for your environment.  This script has to be run from the console, which is a pain when you’ve got more then just one or two Linux VMs.  And may the good Lord help you if the modules aren’t suitable for your running kernel and you don’t have a compiler (or the C header files for your running kernel) already installed.

When VMware added the Automatic Tools Upgrade …

image

The situation certainly improved, but it is by no means a fool proof solution.  In my experience, it doesn’t work 100% of the time for Linux guests (though this *could* be due to the heavy modification I’ve done in my distro).  And furthermore, what if you want to automatically upgrade 100’s of Linux guests, not just one?  Or what if you’ve already got a deployment tool that you’d like to use to push the tools out?  (Kind of tough when the script needs to be run directly in the console)

So, I looked to see if there was a way to improve the situation.  First, I needed to find a way to run vmware-config-tools.pl remotely in an automated fashion.  And by the way, it’s not that you can’t run this script remotely via SSH because you can.  The problem is that when you do so, you immediately get following question …

 

It looks like you are trying to run this program in a remote session. This program will temporarily shut down your network connection, so you should only run it from a local console session.  Are you SURE you want to continue?

 

Unfortunately, to run vmware-config-tools.pl remotely, we need to include the –d flag so that the script will automatically select the default answers to all of the questions for us.  And the problem is, the default answer to this question is “no.”  

So I looked through the vmware-config-tools.pl and I found that it’s really only checking to see if the SSH_CONNECTION environment variable is set.  Well, that’s easy … simply executing vmware-config-tools.pl in a different shell allows us to side step this. 

Next I just created a simple bash script that gets pushed out to the /tmp directory along with the vmware tools installation package (also pushed to the /tmp directory) and gets executed remotely by my deployment tools (which for me are are just more bash scripts, but this should work with any enterprise deployment tool).  Here’s the simple script I used for my guests …

 

#!/bin/bash

RPM=`ls /tmp | grep VMwareTools`

rpm -e VMwareTools
echo "Old VMwareTools removed" > /tmp/vmware_tools_upgrade.log

rpm -i /tmp/$RPM
echo "$RPM installed" > /tmp/vmware_tools_upgrade.log

sh -l root -c /usr/bin/vmware-config-tools.pl -d
echo "vmware-config-tools.pl -d executed" >> /tmp/vmware_tools_upgrade.log

service vmware-tools restart
echo "vmware-tools restarted" >> /tmp/vmware_tools_upgrade.log

service network restart
echo "network restarted" >> /tmp/vmware_tools_upgrade.log

exit

 

This is obviously a very basic script and could easily be enhanced with better logging and error handling.  Also, for Debian distros, such as Ubuntu, you’d need to modify this script to handle the tar.gz installation package … unless, of course, you’ve modified your distro to handle RPMs (as I have).

The good news is that, at least for my environment:

  1. This works 100% of the time and a restart of the VMs is not necessary. 
  2. I no longer have to upgrade many guests by hand.

However keep in mind, there is still a network outage during the upgrade (usually just about a minute or two), so be sure to continue using a maintenance window for your upgrades. 

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

vSphere 4 Update 1 with Update Manager Shenanigans

New Year Lights 2010

Happy New Year!  I hope everyone enjoyed the holiday’s and got to spend some time with friends and family.  If your reading this I suggest you pay tribute to the quality of Virtual Insanity, and give the gift of voting.  Eric Siebert has released a “best of 2009 blog contest”.  If Virtual Insanity has helped you out in some way in the past I suggest casting a vote for this great virtualization blog space!  Ok onto the real reason for this post…

I ran into an oddity while bringing a new host online today into our vSphere environment.  And thought it best to publish my findings.  Hopefully this might save someone a support call.  With vSphere 4 update 1 came a couple of technical issues, which are detailed here and here.  Personally we don’t use ESXi so only the first one was a major issue for us.  We are an HP shop, so the issue around the HP agents and update 1 was a major concern (basically would render the host unbootable).  Luckily VMware support is proactive about announcing issues like this to the community and most people were aware of the problem right away.

The problem I hit today was strange and I thought it was just being off from work for a week.  I went to apply our update 1 baseline to a new host I was bringing up, rescanned, and then got this:

compliant1

What the?  I know this isn’t compliant, our base build is still at 4.0  Check out the build number, that’s proof.  I have used the update 1 baseline for 50+ hosts so I know it’s not that.  So maybe update manager is still on holiday as well, I restart the service and life is good?  Nope.  Same thing.

To make a long story longer, I poke around in the repository and check the update 1 patch and see it’s valid, yep 11/19/09 that’s the right release date.  Why is this thing not working?

update1-first

I kept poking and prodding thinking maybe they released an update to the update?  Sure enough it slipped by me when I wasn’t looking, or it went to my spam mail.  Check the date 12/9/09.

update1-second

I created a new test baseline, and dropped the 12/9/09 update 1 into it and applied it to my new host.  Low and behold:

compliant2

That’s much better.  Strange the older update 1 patch didn’t reflect anything and showed compliant.  As an end user I would have liked to have seen some type of error message, or a reference to the newer released update 1.  Ran the new update, (still stopped the HP agents just in case).  And now things look good again (build number):

looksbetter

Conclusion

Go vote for this site, and make sure you update your update manager, update 1 baseline.  That’s a lot of updates.  See you online!

Scott

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

A handy new addition to the Command Line Tool for View 4

First things first

Thanks to Scott Sauer (@ssauer) and John Blessing (@vTrooper) for holding down the fort here at Virtual Insanity while I’ve been finishing up some unfinished projects and preparing for the VCDX Design Exam (which I take later this month).  One of Scott’s posts actually won a vSphere blog contest.  Nice work Scott!  These two guys are becoming pretty good friends of mine here in the Cincinnati area, so hopefully I can convince them to keep the content flowin’.

An itch I couldn’t scratch

I’ve mentioned here on this blog, at least once or twice, that I “eat the dog food” and actually run my primary XP desktop as a VMware View image.  Since the conversion almost a year ago, everything has been running pretty well with only a few minor bumps along the way.  And with the recent addition of PCoIP, I can’t imaging ever going back.

But there was one little reoccurring problem I was having for which I couldn’t seem to find an answer.  It wasn’t a show stopper of an issue, but it was just an “itch I couldn’t scratch,” if you know what I mean.  And the problem went something like this …

  1. Inside my desktop VM I have a Cisco VPN client, necessary for a secure connection back to corporate HQ in Palo Alto, CA. 
  2. When connecting to my desktop with the VPN client inside the VM inactive, I had no issue.
  3. However, if I disconnect from my desktop while the VPN session was active, then I couldn’t reconnect to my desktop via VMware View. 

The reason?  The broker was sending me the new IP address of the Cisco VPN Adapter, which is an IP address on the VPN, and an IP address my local computer didn’t know about. 

Now, if I were to log off instead of disconnect from my desktop, this would terminate the VPN session and therefore wouldn’t be a problem.  But who wants to log off every time?  More often than not, I have things open on my desktop (e.g. half written emails, documents, browsers with many many open tabs, etc.) that I don’t want to bother saving and closing every time I step away from the computer.  And really the bigger issue is with unintentional disconnects that result from local power/network/OS issues.

I tried all sorts of things to fix this.  Among other thins, I tried …

  1. Reordering the NICs, hoping the broker was just grabbing the first NIC. 
  2. Poking around the broker and agent install files, hoping to find a way to force the IP address. 
  3. I even tried uninstalling and reinstalling the View agent and the Cisco client, hoping the order of installation might do the trick (admittedly, this was a random shot in the dark)

But nothing seemed to work.  So until recently, to reconnect I would have to connect directly to my desktop via RDP, or connect to the console via the VMware Infrastructure Client, then disconnect the Cisco VPN and then reconnect via the View client. 

See what I mean?  Not a show stopper, but man what a pain in the butt! 

The solution

Well I found a way around this with a handy new addition to the Command Line Tool in View4.  Check out page 12 of the Command Line Tool for View Manager titled “Override IP Address.”  On the broker from a DOS prompt, in the c:\Program Files\VMware\VMware View\Server\bin directory, execute the following …

vdmadmin.exe –A –d <desktop name> –m <machine name> –override –i hostname

The “desktop name” is the name of the VM in the broker.  The “machine name” is the name of the VM in vCenter.  It’s likely they’ll be the same, but they don’t have to be and in fact, in my case they weren’t the same.  The “hostname” can be either a FQDN or an IP address.  Oh, and I can tell you that all parameters must be present or the command won’t execute. 

But that was all there was too it.  Now I can disconnect and reconnect to my desktop, regardless of the state of my VPN client.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Capacity Conundrum Part Deux

 

– The vTrooper Report –

 

This is a continuation of the Capacity Conundrum, if you missed the first part start here.

$ per Compute VM

So let’s cut to the chase.  In the case of the compute tiles of our Quad we have a price per vCPU and $ per GB of RAM to settle. Keeping our example 2U server in play we could expect to spend approximately $15,000 for a 2U fully loaded with 4GB DIMMs.   Well unfortunately a small part of that 15K  is consumed in I/O cards and maintenance which needs to be pulled out to get the compute number.  For our argument we will use $10K for the compute system without the I/O cards and maint. costs;  This is the CAPEX we will offset in our $/per values.

vCOMPUTE – FIREPOWAH!

We know how a CPU works right? Move process into memory , execute CPU cycles, churn, churn more, back to the I/O guys, rinse and repeat.  Basically, this is where the hardware container happens in our data centers.   I say container because it’s easy to show it as a box; It’s hard to define what it will always be in physical form.  1U, 2U, 4U, half blade, Full Blade, appliance , PC ; you name it, it is probably in some one’s ‘datacenter’.  The lowest common denominator I have been able to settle on for a common form factor is Cores per Ram.  Grouping per socket fits because you are measuring the type of memory that is close to the CPU socket.  The NUMA architectures of AMD and Intel with memory controllers on-board and transports to the memory DIMMs without access through the I/O  controllers (eg. Northbridge) help define the grouping.

TECHNOTE:  Every core has associated memory banks it will use and every container(physical server) has a series of sockets that it controls.   A hypervisor has a limit to how well it can control the associated memory space to the nearest vCPU.   Generally the hypervisor will always schedule available vCPU’s from the same socket and swap the corresponding memory for those processes to the memory banks of the corresponding socket.  It does this is for efficiencies of the x86 architectures.  It can move the vm to another socket and readdress the memory but it has a ‘cost’ associated with such a move.  Path of least resistance is to stay in the same socket.

If you create a 4 vCPU VM and run it on a 1 core  processor it gets bogged down.   If you have the same VM on a two socket Quad Core (8 Cores)  the four cores utilized by the VM are likely to be on socket 1 or socket 2 .  The cost of splitting the vCPU between the two physical sockets by the scheduler is greater than running the vCPU in the same socket.    AMD delivered this earlier than Intel and sustained higher levels of virtualization consolidation “Per Host” than similar class systems of Intel could provide through the Northbridge.   Core i7 is a new game for Intel and the results of Nehalem show the improvements.

For more indepth information here is a good read:  CPU Scheduler in VMware ESX

We have a host of $10K  CapX charge that has two sockets at a 4/45GB Socket Ratio with approx $5k spend in each socket.  Looking at our Hardware invoice the CPU Cores are about 25% of the cost of a socket so we can assume that our per socket cost is broken down into 25% Core and 75% Memory.   So our Socket Ratio yields a $1250 cost for 4 cores and $3750 for 45GB of memory:

Per Core CapX = $312.50;  Per GB RAM CapX = $83.33

That gives us a bare metal cost without a hypervisor charge on top, but we need a hypervisor to get a VM running.  Adding in the ESX cost for a per socket license of ESX Enterprise Plus (worst case) you can add $3500 each socket.

ESX Lic. Cost per socket CapX = $3500

Raw burn rate of the host would be $8500 per socket if we never loaded a VM on the Host.  Well, we did it for a reason, so let’s get our money back. If we target the standard allocation for this host (4/45GB socket ratio) we get our target VM count of 16 per socket(1 vCPU/2.8 GB RAM).  Also, keep in mind that we broke the socket cost down by 25%  to CPU and 75% to Memory so we will keep that  same  split here.  If we don’t do the split, then any VM that is deployed to the socket will bear the same cost regardless of its size.

ESX Lic. Cost per VM= $218.75   ( 3500 / 16 )

-Or-

Split by the 25/75 % we did previously for the cost of the CPU and Memory and you get a little different calculation.

3500 * .25 = 875 / 16  = $55 AND     3500 *.75 = 2625 / 45  = $58

per vCPU=$55

per vMEM=$58

Adding it up with our target ratios in tow we get the burn rate of the $ per Compute on a VM basis.

($312/4  = 4:1 ratio) + (83*2.8) + {(55*1) + (58 * 2.8) } = $530

Or Summarized: (vCPU = $78)+(vMEM = $233)+(Hyper$=219)=(vCOMPUTE = $530)

Assuming 8760 Hours (1 year) this VM would cost $.06/hr in vCOMPUTE.

Lets apply that to some other VM systems and see if it sticks.  If we plan for the following VM deployment on our socket:

vmGrid

The costs would spit out as such:

vCompute

Or slice it up into a per hour number:

perhour

So based on this analysis some of my VM’s probably only cost $.05 per hour for vCompute.  Interesting. What is more interesting is the fact that the memory cost associated with a VM scales more accurately to the consumption.  You can have as much memory you like for your new 4 and 8 GB aspirations; (eg. memory leaks) you just need to pay for it accordingly.

Too bad that only pays for the top part of my total cost model.  That said, the benefit here is that this model can span across hypervisors and any market hypervisor can be split up to show the cost of a VM consumed on a Xen , KVM, VirtualIron, Parallels’, or Hyper-V infrastructure.

I will be working on a few powershell scripts and excel calculators that one can use to make this model more repeatable. At the very least, it is a model that I will use to consider CapacityIQ and third party products like the offering from VKernel; and the output they measure.  Especially if they consume additional costs on a per socket basis.  Which I can now calculate as Overhead.

Alas there is more to consider, stay tuned for Part III – “the I/O that binds”

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Capacity Conundrum Part 1

 

–The vTrooper Report –

 

In an effort to gel up an internal billing and allocation model (GaaS – Goughing as a Service) I’ve been struggling with the concept of cost per vm.  I was asking a simple question in the twittersphere about that idea and it turned into a discussion and well…got out of hand.  I apologize for that, as this is a better format to explain. (Special thanks to @asweemer for a dumping ground)

If I had a Nickel for each VM…

At VMWorld 2009 there was a presentation in the keynote that showed the price of a vm hosted with Terramark that was $.05 per hour.  I thought  wow.   A nickel per hour.  If I had a nickel per vm/hour; How much would I have available to spend on coffee?

Then I thought wait.  I have VM’s. How much do they really cost me per hour?  Well the answer is … it depends.  Old servers with high power consumption and low density versus a new system with Intel 5500’s and packed in blades have different burn rates visible to different systems(power, cooling, depreciation).  I haven’t found a great model to break those units down to my satisfaction yet.  I need another way.

As a general practice I create in my mind some maxims that I follow in the creation of a VM.

  1. S  – 1vCPU, 1GB Ram, 1GB Net, 10GB Disk
  2. M -2vCPU, 2GB Ram, 1GB Net, 20GB Disk
  3. L – 4vCPU, 4GB Ram, 2GB Net, 40GB Disk

Seems simple enough, but it doesn’ t really generate a cost model on a consistant basis.   Hardware continues to change and each VM that consumes resources does it at different rates and times of the day.   A VM that isn’t doing anything isn’t really ‘consuming’ anything, right?  I thought I would try break it down further by creating a 4 quadrant block with two macro categories:  Compute (CPU and Memory) and I/O (Network and Disk)

cmnd

Each resource area could increase\decrease for a reason without changing the size of the original maxim it was created under.  This allows for small variations of size without having a customer yell that their bill when up by $2 this month.

The Measureable Unit

Use a unit of measure to identify the four quadrants:   vCPU : vMEM : vNET : vDISK  or C:M:N:D  .   Then overlay the VM creation to count up the units.  This way the growth of a ‘VM’ during its lifecycle can be adequately allocated back into the proper IT metric.  Using the VM creation maxims up above this may be:

  1. S  – 1:1:1:1
  2. M -2:2:1:2
  3. L – 4:4:2:4

This isn’t perfect but it at least allows for the average cpu cost to be allocated seperately from a memory, network, and disk cost.  Afterall, you don’t get to upgrade all four parts of the quadrant in the same fiscal year usually.  This also allows a way to trend an average of your cost rate per unit over a period of months and years to see which cost areas are improving.  It is an interesting metric for the business and IT.  Win-Win in my book.  Even if no-one internally ever has to pay the values back (Showback).  It also helps police which VM is consuming too much of a specific value which would skew the numbers if you simply took the cost of the esx hosts and divide by the number of VM’s.

Apples , Oranges, Lemons, and Grapes = Frutti Results

So you have a unit of measure and a type of system to match the measurement up towards over a period of time.  Here’s where the fruit cart and the horse get hooked up.

This is all very complex, why can’t I just buy the same server I have purchased for the last 5 years?

Sorry Kids. They don’t build’em like they used to.  But in todays market, the UCS system from Cisco has a new buzz to the original players of IBM, HP, and Dell.   How do you sort any of that out among the offerings, and how do you select the right platform for your new ESX System?     By the Socket !   Every system of the x86 family has them from both the Intel and AMD families.  And now that you have to pay for your hypervisor and additional tools (Capacity IQ, AppSpeed,  Nexus1000v) per socket  it matters more.  I need to squeeze the value out of those sockets.

Still staying in the upper half of the Quad;  lets measure cores and RAM as a ratio assuming dual rank 4GB Dimms and measure them to some of the standard 2 socket servers.

Standard Intel x5450

2 Socket – 4 Core – 16 Dimms (8 per socket) produces 4 cores/ 32 GB Ram

Standard Intel Nehalem  x5500

2 Socket – 4 Core – 18 Dimms (9 per socket)  produces 4 cores/ 45 GB Ram

Cisco UCS extention  on x5500

2 Socket – 4 Core – 48 Dimms (24 per socket)   produces 4 cores/ 96 GB ram

What this shows is that for every license of ESX consumed in the environment there are different amounts of memory available for a VM to use.  The approach by the UCS system allows for a much higher allowance of memory to a VM at the same licensing cost.   Sure you could buy 4 way servers and claim that the 256 GB of RAM gives the VM more allowance but in reality the vm will have ratios of contention to the vCPU and Memory within each of the 4 sockets. You can change the size of the container by moving to a 4 way,  but it won’t change the value of the ratio  for that container in regards to the cores and memory.

CPU Contention

The idea of CPU contention is becoming more visible to most administrators of virtualized environments because the desire to pack the vm’s onto a host is so strong.  If I can get 10 VM’s on a host for $5000 then getting 25 VM’s on the same host is lowering my cost per vm.  It could also be cheating your customers of the performance they paid. Especially if you have multiple vCPU’s assigned to those 25 VM’s.    This is where the ratio of VM per host becomes obsolete and vCPU/core  makes more sense.

Using the example containers above you can generate an expected number of VM’s per socket.  There is no reason to do a 1:1 ratio of cores to VM because the point of virtualization is to run more with less.  I think a good ratio to start with is 4:1 for a production VM and 16:1 for a VDI implementation:

Standard Intel x5450  -  (4 /32 GB SocketRatio)   yields 16  VM’s with a 1 vCPU/ 2GB ram configuration per socket

Standard Intel Nehalem  x5500  -  (4 /45GB SocketRatio) yields 16  VM’s with a 1 vCPU/ 2.8GB ram configuration per socket

Cisco UCS  -   (4 /96GB SocketRatio) yields 16  VM’s with a 1 vCPU/ 6GB ram configuration per socket

You can always adjust your actual deployment if these ratios don’t match up for your environment.   The expected deployment number helps determine how large the pizza slices are for the team.  Not how many slices each of them consume.  In these configurations you can see where the density of the RAM per socket (SocketRatio) of the UCS allows for much larger VM configurations before overcommitment. A nice fit for the new 64bit installations. These expected numbers of VM per socket help determine what the burn rate of a C:M:N:D value is for the CapX  spend you made.

BurnRate

To fully understand how much a VM costs, one has to look at what was spent in the CapX of the host and agree on the measuring stick to measure the C:M:N:D value of the created VM.  If a series of hosts are in service from different families and are at different parts of lifecycle there may have to be some averaging.  The SocketRatio of Cores/RAM is a consistent way to measure systems from different form factors and families and levelset the expected allocation of VM’s.  The expected allocation of VM’s for a host helps determine what density ratio is desired for vCPU:vMEM.

This is the end of Part 1 –  In Part Deux I will take a deeper dive into the Compute and I/O areas and assign a more detail cost per VM model.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

How vCenter handles custom Sysprep.inf files

I ran into an issue the other day as I was trying to deploy a VM from from a template using the vCenter Customization Specification Manager.  I was trying to use a custom Sysprep.inf file that would automatically have the newly created VM join my AD domain and placed in a specific OU. 

Now, when I Sysprep’d the VM normally with my custom .inf file, everything worked fine.  But importing that exact .inf file into vCenter and deploying the VM from a template, the Sysprep failed.  So I had some digging to do, and here’s what I found out.

First, vCenter doesn’t actually store Sysprep.inf files.  Rather, vCenter stores the configuration parameters in XML and then generates the Sysprep.inf file on the fly during the deployment process.  (This part I actually already knew, the next part I didn’t).

Second, and most importantly, when importing a customized Sysprep.inf file, vCenter does not store each parameter as a separate XML element.  So for example, given the following custom text …

[Identification]
JoinDomain=mydomain.com

DomainAdmin=administrator

DomainPassword=1234

MachineObjectOU="OU = MyOU,DC = mydomain,DC = com"

 
I thought this would be stored in the normal, expected format, like this …

<JoinDomain>mydomain.com</JoinDomain>

<DomainAdmin>administrator</DomainAdmin>

<DomainPassword>1234</DomainPassword>

<MachineObjectOU>"OU = MyOU,DC = mydomain,DC = com" </MachineObejctOU>

But it turns out, when importing sysprep.inf files, vCenter stores the parameters as a single XML element with a modified <_type> element like this …

<_type>vim.vm.customization.SysprepText</_type>

<value> [Identification] JoinDomain=mydomain.com DomainAdmin=administrator DomainAdminPassword=1234 MachineObjectOU="OU = MyOU,DC = mydomain,DC = com"</value>

There’s a couple important points to note here:

  1. vCenter only stores the XML this way when importing a sysprep.inf file.  When using the customization wizard, vCenter generates XML which is formatted the normal way. 
  2. The first element, <_type>, contains the value vim.vm.customization.SysprepText.  When using the wizard, the value for this element is vim.vm.customization.Sysprep (without the trailing “Text”).
  3. When the XML is stored this way, whitespace matters!  Notice how whitespace is the delimiter in the <value> element?  And notice the spaces in the MachineObjectOU parameter?  Removing the spaces did the trick.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VCDX Admin Exam Notes – Section 1.4

I’m trying to get these out faster.  But I’m finding it’s a pain to compile, tweak and reformat my notes so they make sense and look right on the blog.  Here’s the next in the series.

Objective 1.4 – Implement and manage Storage VMotion.

Knowledge

Describe Storage VMotion operation

I like to think of Storage VMotion as the “inverse” of VMotion.  Instead of moving (live) the front end (i.e. CPU, memory, and network) from one physical device to another, Storage VMotion moves (live) the back end (i.e. disk) from one physical device to another.

The following was taken directly from the VMware.com website and acurately describes how a Storage VMotion works.

 

image

  1. Before moving disk files, Storage VMotion creates a new virtual machine home directory for the virtual machine in the destination datastore.

  2. Next, a new instance of the virtual machine is created. Its configuration is kept in the new datastore.
  3. Storage VMotion then creates a child disk for each virtual machine disk that is being moved to capture a copy of write activity, while the parent disk is in read only mode.
  4. The original parent disk is copied to the new storage location.
  5. The child disk is re-parented to the newly copied parent disk in the new location.
  6. When the transfer to the new copy of the virtual machine is completed, and the original instance is shut down. Then, the original virtual machine home is deleted from VMware vStorage VMFS at the source location.

Explain implementation process for Storage VMotion

For ESX 3.5, Storage VMotion is a command-line only implementation.  There are third party GUI plugins to the VI client that I have used and work really well, but they are of course not supported by VMware.  And for the purposes of this post, I would imagine the VCDX exam will stick to VMware only supported implementations.

To execute a Storage VMotion, you’ll need the RCLI (Remote Command Line Interface) from VMware.com.  There is an RCLI for both Windows and Linux, and there’s an RCLI virtual appliance too.  Take your pick, and then be sure to review the Remote Command-Line Interface Installation and Reference Guide for more info on RCLI.  But specifically for Storage VMotion, the RCLI command comes in two flavors (from the guide):

 

To use the command in interactive mode, type svmotion –interactive. You are
prompted for all the information necessary to complete the storage migration.
When you invoke the command in interactive mode, all other parameters are
ignored.

In noninteractive mode, the svmotion command uses the following syntax:

svmotion [standard Remote CLI options] –datacenter=<datacenter name>
–vm <VM config datastore path>:<new datastore>
[--disks <virtual disk datastore path>:<new datastore>,
<virtual disk datastore path>:<new datastore>]

 

    Identify Storage VMotion use cases

There are a number of reasons that I can think of where  you’d want to use Storage VMotion.  Some of the more obvious reasons (at least to me, anyway) would be:

  1. Array maintenance
  2. Migrating to newer or different (i.e. FC, iSCSI, NAS) hardware
  3. Adding new storage
  4. To achieve optimal distribution of storage consumption across LUNs
  5. To resolve storage bottlenecks and other performance issues

I’m sure if I thought about it some more, I could come up with a few more.  But I think this is a decent list.

Understand performance implications for Storage VMotion

There are a couple things that occur during a Storage VMotion that could affect performance, and therefore need to be considered when planning to move storage. 

  1. During the Storage VMotion, the VM actually does a “Self VMotion,” meaning the VM is VMotioned to the same host it’s already running on (review step #2 in the graphic above).  And therefore during this time, there is temporarily twice the amount of memory consumed.
  2. During the move, extra disk space is required on the source volume while all disk writes are redirected to the snapshot disk.
  3. All disk I/0 for the copy (i.e. read from the source volume, write to the snapshot disk and write to the destination volume) is going through the VMkernel of the host.

 

Skills and Abilities

Use Remote CLI to perform Storage VMotion operations

  • Interactive mode
    This is pretty easy.  Just enter svmotion –interactive at the command line and then follow the prompts.
  • Non-interactive mode
    In my environment, the command
    to move all the virtual disks associated with aaron-corp-xp (the name of an actual VM I use) from vol1 to vol2, would look like this

svmotion –url=https://10.10.8.60/sdk –username=aaron –password=<yeah right> –datacenter=cincylab –vm=’[vol1] aaron-corp-xp/aaron-corp-xp.vmx: vol2’

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VCDX Admin Exam Notes – Section 1.3

Ugh.  My brain hurts.  I’ve spent the past few hours reviewing scripted ESX installations and working on a PowerShell script for a customer that will reorder vmnics after a scripted installation is complete (because I can’t find any other way to force their order during the install).  It’s been a few months since I’ve done a scripted installation, so I definitely needed a refresher.  Plus, according to the VMware Enterprise Administration Exam Blueprint v3.5, section 8.1 is all about automating ESX deployments.  The good news is that section 8.1 is the last section of the blueprint, so I believe I’m almost done preparing for the VCDX Admin Exam, which I’m scheduled to take in a few days. 

Anyway, going back to the beginning of the Blueprint, and continuing from where I left off, here is the next section of my study notes.

Objective 1.3 – Troubleshoot Virtual Infrastructure storage components.

Knowledge

Identify storage related events and log entries.  Analyze storage events to determine related issues.

All storage related events will be recorded in the /var/log/vmkernel log file.  Most of the messages in this log file are fairly cryptic and can be difficult to interpret.  Furthermore, this log file contains all messages from the vmkernel, not just storage related messages, so you’ll have to filter through it.  An easy way to do this is simply to search for SCSI.  For example, the command cat /var/log/vmkernel | grep SCSI on one of my servers produces the following output (only showing the last 10 lines) …

[root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -10
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1069)iSCSI: login phase for session 0×8603f90 (rx 1071, tx 1070) timed out at 23051576, timeout was set for 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0×8603f90 connect timed out at 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)<5>iSCSI: session 0×8603f90 iSCSI: session 0×8603f90 retrying all the portals again, since the portal list got exhausted
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0×8603f90 to iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1 waiting 1 seconds before next login attempt
May 18 12:57:48 cincylab-esx3 vmkernel: 2:16:01:56.748 cpu2:1071)iSCSI: bus 0 target 0 trying to establish session 0×8603f90 to portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:00 cincylab-esx3 vmkernel: 2:16:02:09.355 cpu2:1071)iSCSI: bus 0 target 0 established session 0×8603f90 #3, portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba35:C0:T0:L0 : 0×0 0×80 0×83 
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba35:C0:T0:L0: 0×1 0×1 0×0 0×18 0×42 0×55 0×46 0×46 0×41 0×4c 0×4f 0×0 0×0 0×0 0×0 0×0 0×1 0×0 0×0 0×0 0×0 0×0 0×0 0×0 0×2 0×0 0×0 0×0 
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba35:C0:T0:L0 0×20 0×20 0×20 0×20 0×56 0×49 0×52 0×54 0×55 0×41 
[root@cincylab-esx3 root]#

If you look closely, I clearly had some issues with my iSCSI appliance a few hours ago.  I decided make some configuration changes to the switch and then, all of a sudden, the ESX server lost connectivity to its storage.  Weird! :)

Anyway, what does all this mean?  There’s an really good VMworld Europe 2008 presentation (which you can get from www.vmworld.com) titled VI3 Advanced Log Analysis, which goes into detail about how to interpret VMware log files.  From that presentation, I found this diagram which describes the components of a message in the vmkernel log file.

image

 

Skills and Abilities

Verify storage configuration and troubleshoot storage connection issues using CLI , VI Client and logs

  • Rescan events
    A rescan event can be initiated with the esxcfg-rescan at the command line.  The output should look like the following …
  • [root@cincylab-esx3 root]# esxcfg-rescan vmhba32
    Rescanning vmhba32 …
    On scsi1, removing: 0:0.
    On scsi1, adding: 0:0.
    Done.
    [root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -3
    May 18 19:13:09 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T0:L0 : 0×0 0×80 0×83 
    May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T0:L0: 0×2 0×0 0×0 0×18 0×4c 0×69 0×6e 0×75 0×78 0×20 0×41 0×54 0×41 0×2d 0×53 0×43 0×53 0×49 0×20 0×73 0×69 0×6d 0×75 0x
    May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T0:L0 0×36 0×52 0×58 0×36 0×4a 0×39 0×39 0×58 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×47 0×42 0×30 0×31 0×36 0x
    [root@cincylab-esx3 root]#


  • Failover events
    I don’t have redundant paths in my lab to simulate this.  So, again from the VMworld presentation, VI3 Advanced Log Analysis, here is a screen shot from the slide that covers this topic.

image

There will obviously be a lot of different types of error and event messages in /var/log/vmkernel.  And I’m certainly not going to try and list every possible combination here.  I highly suggest you download the VMworld preso because it does a great job of explaining how to further decipher the log files (like defining SCSI error codes). 

Well, that’s about it for this section.  Back to PowerShell scripting for another hour or so.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VCDX Admin Exam Notes – Section 1.2

Last week I was in San Francisco with most (maybe all) of the VMware field technical folks for a three day technical summit.  One of the evenings we had an awards ceremony and dinner.  And guess what?  The first eight VCDX certifications ever to be awarded were announced.

Now, VMware is a pretty big company, so I didn’t recognize seven of the eight names.  But I definitely recognized one of them.  Well, that is, I should say I recognized his name.  Having never officially met him fact to face, I couldn’t pick him out of a crowd of two.  You might know him as the rock star blogger from Yellow Bricks.  Congratulations Duncan Epping!  I believe he said he is VCDX number seven and the first VCDX in Europe.  Very cool.  And well deserved, for sure.  :)

I’m a little behind Duncan.  I’m scheduled to take the Admin Exam later this month, which is the first of two exams.  Then I’ll have to present and defend a successful design and deployment before a jury … er, I mean, panel of my peers.

Anyway, here are my notes for section 1.2 of the VMware Enterprise Administration Exam Blueprint v3.5. (Section 1.1 can be found here).  Everything in Blue is a direct cut and paste from the exam blueprint.

Objective 1.2 – Implement and manage complex data security and replication
configurations.

Knowledge

Describe methods to secure access to virtual disks and related storage devices

  • Distributed Lock Handling

    vmfs_dfl
    In the graphic below, notice how each ESX server sees and has access to the same LUN? This is achieved via VMFS, a clustered file system which leverages distributed file locking to allow multiple hosts to access the same storage.  When a Virtual Machine is powered on, VMFS places a lock on its files, ensuring no other ESX server can access them.

Identify tools and steps necessary to manage replicated VMFS volumes

  • Resignaturing
    First, there’s a really good article on VMFS resignaturing by Duncan (go figure).  Also, Chad Sakac over at Virtual Geek has a great article too.  I’m not going to reinvent the wheel, so make sure you read their posts.  You’ll need to understand this.  For the exam, you’ll certainly need to know the following …

    The following is from the Fibre Channel SAN Configuration Guide:

EnableResignature=0, DisallowSnapshotLUN=1 (default)
In this state:

  • You cannot bring snapshots or replicas of VMFS volumes by the array into the ESX Server host regardless of whether or not the ESX Server has access to the original LUN.
  • LUNs formatted with VMFS must have the same ID for each ESX Server host.

EnableResignature=1, (DisallowSnapshotLUN is not relevant)
In this state, you can safely bring snapshots or replicas of VMFS volumes into the same servers as the original and they are automatically resignatured.

    EnableResignature=0, DisallowSnapshotLUN=0
    This is similar to ESX 2.x behavior.  In this state, the ESX Server assumes that it sees only one replica or snapshot of a given LUN and never tries to resignature. This is ideal in a DR scenario where you are bringing a replica of a LUN to a new cluster of ESX Servers, possibly on another site that does not have access to the source LUN. In such a case, the ESX Server uses the replica as if it is the original.

Do not use this setting if you are bringing snapshots or replicas of a LUN into a server
with access to the original LUN. This can have destructive results including:

  • If you create snapshots of a VMFS volume one or more times and dynamically
    bring one or more of those snapshots into an ESX Server, only the first copy is
    usable. The usable copy is most likely the primary copy. After reboot, it is
    impossible to determine which volume (the source or one of the snapshots) is
    usable. This nondeterministic behavior is dangerous.
  • If you create a snapshot of a spanned VMFS volume, an ESX Server host might
    reassemble the volume from fragments that belong to different snapshots. This can
    corrupt your file system.

Skills and Abilities

Configure storage network segmentation

  • FC Zoning
    Zoning delivers access control in the SAN, restricting visibility to devices in the zone solely to other members of that zone.  It is a common technique used to do things like group ESX servers into production/test/dev, increase security and decrease traffic, among other things.
  • iSCSI/NFS VLAN
    Storage segmentation for IP storage can be accomplished in one of two ways:  VLANs or physical segmentation (i.e. separate layer 2 switches for storage).

Configure LUN masking

The Disk.MaskLUNs parameter should be used when you’re trying to mask specific LUNs to your ESX host.  This is a useful option when you don’t want  your ESX server to access a particular LUN, but are unwilling (or unable) to configure your FC switch.

To configure LUN masking in the VI Client go to Configuration –> Advanced Settings for the host you want to configure. You’ll find the Disk.MaskLUNs parameter under the section Disk.  It looks like this in my VI Client.

disk.maskluns

Enter a value in the following format … <adapter>:<target>:<comma separated LUN range list>. Be sure to rescan when your done and verify the Mask has been properly applied.

Use esxcfg-advcfg
This one’s easy.  Just use the man page (type “man esxcfg-advcfg” at the command prompt).  It’ll tell you everything you need to know :)

Set Resignaturing and Snapshot LUN options
So, following along with the man page above, here is a cut and paste from my server …


[asweemer@cincylab-esx3 config]$ su -
Password:
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/EnableResignature
Value of EnableResignature is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/EnableResignature
Value of EnableResignature is 1
[root@cincylab-esx3 root]#
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 1
[root@cincylab-esx3 root]#


Manage RDMs in a replicated environment
RDMs can be created via the CLI with the following command …

vmkfstools -r /vmfs/devices/disks/vmhbaX:Y:Z:0 my-vm.vmdk

By default, the RDM will be created in Virtual Compatibility Mode.  But should you need and/or prefer Physical Compatibility Mode, you can change this by editing the VMDK file and changing the createType value to vmfsPassthroughRawDeviceMap.

Use proc nodes to identify driver configuration and options
The proc filesystem is a pseudo filesystem, it’s not “real.”  It consumes no storage space and is used to access process information from the kernel.  You’ll find quite a bit of valuable data and configuration options in the many subdirectories of /proc/vmware/config.  Here’s a quick example from my ESX server …


[asweemer@cincylab-esx3 LVM]$ pwd
/proc/vmware/config/LVM
[asweemer@cincylab-esx3 LVM]$ ls
DisallowSnapshotLun  EnableResignature
[asweemer@cincylab-esx3 LVM]$ cat EnableResignature
EnableResignature (Enable Volume Resignaturing) [0-1: default = 0]: 0
[asweemer@cincylab-esx3 LVM]$


Use esxcfg-module

Just like esxcfg-adv, use the man page.


Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon