Archive for January, 2010
I am attempting to pull together a blog post around performance, it’s going to be a four part segment on each I/O component of VMware, CPU, Memory, Storage and Networking. My goal is to try and cover the various tools that you can use to help troubleshoot performance problems that you might experience in your virtual environment.
While I was going through some of the methods, I wanted to illustrate how VMware now includes Windows Performance Counters inside a guest virtual machine to assist with performance monitoring/troubleshooting. I jumped on a test virtual machine I have, and pulled up Windows perfmon. To my dismay the VMware counters are missing! We are currently running VMware vSphere 4.0 update 1 so I checked with a few other people online like Rick Vanover (@RickVanover). It confirmed it seemed to be related to this specific release of vSphere.
I reached out to Scott Drummonds via Twitter (@drummonds), a performance systems engineer who works for VMware, and also opened a service request with support. Scott validated that he saw the same issue and was launching an investigation. Unfortunately the SR didn’t get very far as I was instructed that this was an “experimental feature and was removed from vSphere”. Uhhh ok, I knew that wasn’t right so I waited to hear back from Scott.
Scott has since written a blog post that discusses this issue. It looks like a complete uninstall of the VMware tools on the client followed by a re-install resolves the issue. This does require a reboot for those that are not familiar with this process. The problem appears to be related to mofcomp which it a tool that Microsoft provides and registers WMI information (such as VMware performance counters) with Windows.
Thanks to Scott for jumping on this so quickly and posting a fix to the issue, it’s great to see social media paying off in the real world. Thanks to Rick for helping me figure out what was going on and validating some of my assumptions. Rick has also written up an excellent blog post on this same issue. Hopefully a patch will be rolled into the next minor release of vSphere 4 that will resolve this bug going forward.
A fellow engineer extraordinaire (Mike Evans) inspired me to write up this blog post. Mike and I have been using the “notes” attribute for virtual machines for a few years. It has come in very handy to track who requested the virtual machine resource, and the date the virtual machine was provisioned. If your not familiar with the notes field, it’s at the bottom of the summary page of a virtual machine properties page.
This little piece of information might seem trivial to the layperson but the larger your virtual environment grows, the more complex it becomes. Having a way to track this fluid, ever changing infrastructure becomes more and more important as your begin to scale up and out.
The “Notes” field was great for us except that we began to notice the variations of details that we had entered into each virtual machines properties. Not probably a huge deal if you have a small VMware environment but when you start tracking several hundreds of virtual machines, it really starts having an affect on reporting. A newer feature that was added to Virtual Center was the ability to use attributes, or pre-defined fields that can be populated. This gives a VMware administrator the ability to have a common format for reporting on Virtual Infrastructure. Below is a screen shot of the Custom Attributes you can find in Virtual Center:
Notice there are three different categories I have displayed in this view, Global, Host, and Virtual Machine. You can set attributes at multiple places in your VI environment that you wish to track. You can see we are interested in Virtual Machines attributes for certain variables (Owner, Provision Date, Provisioned By, Purpose). We have a different interest at the host level (Build Date) for maintenance tracking purposes of physical hardware assets.
Here is where all your hard work starts to pay off.
It’s audit time, you are tasked with trimming the fat in your environment because once again you are out of capacity, and the budget just got crushed for the rest of the year because “Insert your reason here” the UPS batteries just exploded! Go into Virtual Center and generate a report of your virtual infrastructure so you can get a report of who owns what, and what date it was deployed. Go to your Virtual Machine view, select your datacenter, go to the menu option “Export” and then select “Export List”. Save the export as a Excel Spreadsheet, and view your results. Notice the highlighted columns K through N, these are the custom attributes that we added above.
Virtual Center custom attributes are a great way to help manage your growing environment. Sit down with your team, or your potential customer and find out what values matter most in your environment. Create the custom attributed at the various places in Virtual Center. Make sure you are diligent about filling out the details when you bring up new systems and make it part of your internal process and documentation. You will thank yourself down the road.
I like to try and save my employer money when possible. I am of the opinion I would be doing them a disservice if I didn’t examine and evaluate a product that we paid for. Our company decided to take the plunge and upgrade all of our licensing to vSphere Enterprise Plus. There is a new backup/data protection product that was introduced with this recent release. Here is the technical definition of VMware Data Recovery from the administration guide:
VMware® Data Recovery creates backups of virtual machines without interrupting their use or the data and services they provide. Data Recovery manages existing backups, removing backups as they become older. It
also supports deduplication to remove redundant data.
Data Recovery is built on the VMware vStorage API for Data Protection. It is integrated with VMware vCenter Server, allowing you to centralize the scheduling of backup jobs. Integration with vCenter Server also enables virtual machines to be backed up, even when they are moved using VMware VMotion™ or VMware
Distributed Resource Scheduler (DRS).
Sounds pretty good right? You get a backup application (with de-dup!) that could possibly displace your primary method of backups built specifically for VMware? I did a little digging in the community and was disappointed to learn that vDR is not exactly an enterprise product. A lot of the feedback from other VMware engineers was “it’s a 1.0 product and is designed for a small installations”. The maximum supported virtual machine backup configuration is 100 virtual machines.
I decide to check it out for myself and see if it was a fit for our environment and might possibly alleviate some of our backup problems. Our primary site is rather large, but we are now implementing vSphere at our smaller satellite locations and this might be a fit for a smaller office configuration.
The installation was quite easy, VMware has provided another great virtual appliance that can be downloaded from their website. After you import the virtual appliance via Virtual center and assign the host a static IP address, you then need to install the VC plug-in so you can manage your newly installed appliance.
After I ran through the installation and configuration I was disappointed to discover that I couldn’t get VDR to launch. I kept getting prompted for authentication credentials which was odd. I thought maybe I had incorrectly set something up so I went back and reviewed the administration guide. Upon closer examination (RTFM) I discovered that vDR doesn’t support Virtual Center running in linked mode. To my dismay, we are running in a VC linked mode in anticipation of a Site Recovery Manager implementation. Our remote sites are managed by our primary site Virtual Center to save costs. I discovered a work around by pointing the VC client to the ESX server that is managing the vDR appliance. This would only give me access to backup other virtual machines hosted on the same ESX host so I could continue my testing. I hope this is something that future versions of the product will address and fix.
Once you launch the vDR console, you are immediately prompted by a configuration wizard to begin setting up your environment. Here are the following steps in the order they are presented:
- Select your Virtual Machines to backup.
- Select your destination (CIFS share, attached vmdk, or RDM).
- Select your backup window.
- Select your retention policy.
All of these are straight forward and don’t require much discussion. The only step I found a little confusing was the retention policy. Personally I would have preferred something a little more technical than “Few/More/Many”.
The retention Policy radio buttons are pre-defined settings and will change the policy details below. Change the buttons and you can see the variables change and what each setting will mean in terms of your destination data storage. Use caution here as each vDR appliance can only support up to 1TB in data store size, with a maximum of two stores.
The underlying backup technology behind vDR is the new vStorage API (Not VCB), it takes advantage of a new feature called change block tracking. After the first full backup is performed, Change block tracking examines the virtual disk being backed up and only backs up the differences from the first backup. This means less backup traffic going across your network.
I selected a CIFS/Windows share at our disaster recovery site to perform the backup testing. The test share was a ~600GB, 5+1 (10K) of locally attached SCSI storage on a HP DL380. I selected a couple of Windows virtual machines to test with and kicked off the backup jobs. Below is a screenshot of the reporting window for vDR (sorry for all the censorship).
The jobs seemed to run pretty slow in general, but completed successfully without errors (the error listed above is because of my linked virtual center configuration). In my opinion the reporting interface is lacking some details. I would have liked to have seen what throughput I was getting during the backups. The only way I could see the throughput was by monitoring the windows host that was housing the data store. I would have liked a more detailed task status, so I could tell what was going on through out the backup operation. Data de-duplication ratio would have been another great detail to see. This could help determine the total backup and estimated completion time of each virtual machine, which is another variable I found to be missing.
There are two approaches to restoring data using vDR, the first method is a full system restore. This method will recover the entire virtual machine, system state and all corresponding data. When performing a full system restore you can restore the data to an alternate esx host, data store, and decide if you wish the network interface connected or not. I even found that you can change the virtual disk node, and select an alternate SCSI path to recover your disk path too.
The second option is a file level restore (FLR), which typically most people would tend use on a more regular basis. Unfortunately the vDR console can’t recover individual files without some additional configuration. You need to install “VMwareRestoreClient.exe” executable on a virtual machine, which then will give you the ability to browse your data store contents and select individual files to recover. I anticipate that we will see the FLR components get rolled into the vDR console in a future release.
VMware Data Recovery lacks a lot of critical pieces that an enterprise backup application should and needs to provide. The product is a great start for smaller VMware implementations, but even at that I could see it quickly being outgrown. Here are the areas I would love to see improved on in future releases of the product:
- Need support for linked Virtual Center’s. Personally I could use this product at some of our smaller locations but can’t leverage vDR since we are running in a linked mode.
- Need to support larger capacity of virtual machines. 100 virtual machines is not enough, the product needs to scale to support a larger VMware implementation (not necessarily Enterprise).
- Need support for larger data stores. 1TB is not a lot of space when you are going to be backing multiple virtual machines up and retaining their data for longer periods of time.
- Need support for more data stores per vDR appliance. Again this goes back to scale, storage growth is exponential in our current environment.
- Support for a global vDR manager. I would love to see VMware develop a central master or parent vDR console that would allow you to manage your children appliances, and the data stores that they manage.
- Single console for both full system restores and file level restores.
VMware Data Recovery comes with all versions of VMware vSphere except for vSphere standard. This is a great entry level backup solution with de-duplication included. I am excited to see the product develop into a more mature product that can scale with some bigger environments. I also feel that including vDR in the standard version of vSphere would only help the SMB market embrace virtualization at a higher adoption rate.
With the launch of VMware vSphere came some new products that I hadn’t really paid much attention to (busy upgrading I guess). One of the newer products is a Virtual Center reporting tool called Capacity IQ. This product gives an administrator the ability to analyze, forecast and plan for future growth across your ESX environment. I have had a lot of experience with monitoring/reporting tools in the past, I won’t bore you with the details, so I was quite skeptical of a 1.0 reporting tool for Virtual Center. I must admit I was blown away by the immediate relevant reports the product was able to produce.
After pulling down the trial install and obtaining the demo key, I loaded it up for a spin. I am not going to document the installation steps needed as Eric Gray has done this for us already. It by far is the easiest reporting application I have ever installed. If your interested in taking it for a trial run, download the virtual appliance from VMware’s website here (OVF format). Once you import the virtual appliance and give it a static IP address, it will need to collect data about your environment for a while.
There are three basic views that CIQ gives you once you install the plug-in, dashboard, views and reports.
The dashboard tab is designed to give you a quick overview of the item you have selected. Capacity IQ uses the same approach as virtual center does, whatever object you have selected will be reported and focused on. Here is a view of one of our clusters, notice January 11th on the Trend and Forecast graph on top.
One of our clusters was out of resources, I added two more physical hosts to the cluster. You can see CIQ picks up the new physical host resources for the cluster and reflects this by increasing the number of virtual machines it believes the cluster can accommodate. Want to see something even more interesting, check out the pink graph on the 17th. Capacity IQ is already using a prebuilt formula to assume what it thinks we will have (or won’t have) a week out. Pretty impressive.
The views tab is designed to give you a more detailed look on some of the specific data points. Here is a screenshot of the various reports you can execute:
So here is where you can get some great visual reports to present to either upper management, or a potential customer. This gives you a nice interface that you can customize with data points that you can tweak. Check out the first report on this cluster:
This gives you a graphical historical view of your cluster, how many virtual machines you have added over the course of time. Notice the horizontal sliding bar at the bottom of the chart. This allows you to adjust your variable time/date window. The lighter shaded line to the right is the projected or forecasted growth of how the cluster might continue to grow. The views tab is a great place to run some ad-hoc reports, gives you the ability to select the type of report, and even allows you to export the data.
The reports tab is the “pre-canned” reports that can be executed by the administrator. The one thing I was disappointed to not see here was the ability to schedule these reports to run at a particular interval (weekly/monthly). This is something that I assume will probably be introduced in future releases of the product.
After the report is executed and compiled, you are then provided with a .pdf or .csv version of your dataset to download and review. The first report totaled 17 pages and provided some great technical information. Here is the table of contents:
I am very impressed with Capacity IQ. There are no agents you need to install across the virtual machines you wish to report against. The installation was very straight forward, I think I had it up and running in about 15 minutes. Once the virtual appliance was in place, all it needed was a little bit of time to start crunching some data. The reports are well written and very relevant to what an administrator would desire and wish to see. If your looking for a nice reporting tool to help you forecast, give this one a test to see if it fits your needs.
Installing and/or upgrading VMware tools has always been a bit more complicated for Linux guests than for Windows guests. After the installation of the package binaries, the vmware-config-tools.pl script must be run to configure the tools for your environment. This script has to be run from the console, which is a pain when you’ve got more then just one or two Linux VMs. And may the good Lord help you if the modules aren’t suitable for your running kernel and you don’t have a compiler (or the C header files for your running kernel) already installed.
When VMware added the Automatic Tools Upgrade …
The situation certainly improved, but it is by no means a fool proof solution. In my experience, it doesn’t work 100% of the time for Linux guests (though this *could* be due to the heavy modification I’ve done in my distro). And furthermore, what if you want to automatically upgrade 100’s of Linux guests, not just one? Or what if you’ve already got a deployment tool that you’d like to use to push the tools out? (Kind of tough when the script needs to be run directly in the console)
So, I looked to see if there was a way to improve the situation. First, I needed to find a way to run vmware-config-tools.pl remotely in an automated fashion. And by the way, it’s not that you can’t run this script remotely via SSH because you can. The problem is that when you do so, you immediately get following question …
It looks like you are trying to run this program in a remote session. This program will temporarily shut down your network connection, so you should only run it from a local console session. Are you SURE you want to continue?
Unfortunately, to run vmware-config-tools.pl remotely, we need to include the –d flag so that the script will automatically select the default answers to all of the questions for us. And the problem is, the default answer to this question is “no.”
So I looked through the vmware-config-tools.pl and I found that it’s really only checking to see if the SSH_CONNECTION environment variable is set. Well, that’s easy … simply executing vmware-config-tools.pl in a different shell allows us to side step this.
Next I just created a simple bash script that gets pushed out to the /tmp directory along with the vmware tools installation package (also pushed to the /tmp directory) and gets executed remotely by my deployment tools (which for me are are just more bash scripts, but this should work with any enterprise deployment tool). Here’s the simple script I used for my guests …
RPM=`ls /tmp | grep VMwareTools`
rpm -e VMwareTools
echo "Old VMwareTools removed" > /tmp/vmware_tools_upgrade.log
rpm -i /tmp/$RPM
echo "$RPM installed" > /tmp/vmware_tools_upgrade.log
sh -l root -c /usr/bin/vmware-config-tools.pl -d
echo "vmware-config-tools.pl -d executed" >> /tmp/vmware_tools_upgrade.log
service vmware-tools restart
echo "vmware-tools restarted" >> /tmp/vmware_tools_upgrade.log
service network restart
echo "network restarted" >> /tmp/vmware_tools_upgrade.log
This is obviously a very basic script and could easily be enhanced with better logging and error handling. Also, for Debian distros, such as Ubuntu, you’d need to modify this script to handle the tar.gz installation package … unless, of course, you’ve modified your distro to handle RPMs (as I have).
The good news is that, at least for my environment:
- This works 100% of the time and a restart of the VMs is not necessary.
- I no longer have to upgrade many guests by hand.
However keep in mind, there is still a network outage during the upgrade (usually just about a minute or two), so be sure to continue using a maintenance window for your upgrades.
Happy New Year! I hope everyone enjoyed the holiday’s and got to spend some time with friends and family. If your reading this I suggest you pay tribute to the quality of Virtual Insanity, and give the gift of voting. Eric Siebert has released a “best of 2009 blog contest”. If Virtual Insanity has helped you out in some way in the past I suggest casting a vote for this great virtualization blog space! Ok onto the real reason for this post…
I ran into an oddity while bringing a new host online today into our vSphere environment. And thought it best to publish my findings. Hopefully this might save someone a support call. With vSphere 4 update 1 came a couple of technical issues, which are detailed here and here. Personally we don’t use ESXi so only the first one was a major issue for us. We are an HP shop, so the issue around the HP agents and update 1 was a major concern (basically would render the host unbootable). Luckily VMware support is proactive about announcing issues like this to the community and most people were aware of the problem right away.
The problem I hit today was strange and I thought it was just being off from work for a week. I went to apply our update 1 baseline to a new host I was bringing up, rescanned, and then got this:
What the? I know this isn’t compliant, our base build is still at 4.0 Check out the build number, that’s proof. I have used the update 1 baseline for 50+ hosts so I know it’s not that. So maybe update manager is still on holiday as well, I restart the service and life is good? Nope. Same thing.
To make a long story longer, I poke around in the repository and check the update 1 patch and see it’s valid, yep 11/19/09 that’s the right release date. Why is this thing not working?
I kept poking and prodding thinking maybe they released an update to the update? Sure enough it slipped by me when I wasn’t looking, or it went to my spam mail. Check the date 12/9/09.
I created a new test baseline, and dropped the 12/9/09 update 1 into it and applied it to my new host. Low and behold:
That’s much better. Strange the older update 1 patch didn’t reflect anything and showed compliant. As an end user I would have liked to have seen some type of error message, or a reference to the newer released update 1. Ran the new update, (still stopped the HP agents just in case). And now things look good again (build number):
Go vote for this site, and make sure you update your update manager, update 1 baseline. That’s a lot of updates. See you online!