Archive for the ‘Scott Sauer’ Category
Introduction
Personally one of the most interesting components of the VMware architecture I/O stack is storage. There are a plethora of diverse storage solutions in the industry today that offer unique different ways of addressing storage performance, as well as the increase in capacity demands. Storage problems are the most common mis-configuration effecting performance that exists in VMware today. An oversaturated LUN will effect all virtual machines that share that same data store. Take this concept up a level, a group of disks (RAID group) that are saturated with I/O will negatively impact all LUNS that share those same physical spindles. Storage traditionally has been the “red headed step child” in VMware and hasn’t gotten a lot of visibility. Storage I/O bottlenecks can create serious virtual machine problems and yet it wasn’t until ESX 3.x that graphic visibility was even displayed to VMware administrators, see 2.x MUI reflects CPU and memory (Management User Interface for those newer to VMware).
Introduction
I attended last months Cincinnati VMUG (VMware User Group) and was surprised to hear the responses from the audience on how many customers had not taken the plunge, and upgraded to vSphere yet. I think there were a handful of users that had just completed the upgrade. Sometimes I forget to step out of my own personal space and consider what others have going on in their own environments. If your still wondering about the upgrade, Aaron has a post on some of the benefits of going from VI3 to vSphere.
Part of the process of upgrading your existing investment is the need to upgrade all of the virtual machines to the latest and greatest virtual machine hardware version 7. Someone mentioned to me how much of a pain this was since you had to touch each virtual machine, and my response to them was “It only takes a couple of minutes”. I wanted to prove this theory in a different way, so I mulled over it and came up with a timed video clip. The song I chose is 2 minutes and 39 seconds, so I figured If I can knock this out within the amount of time it takes for the song to play, well then, mission accomplished. Read the rest of this entry »
Introduction
A co-worker asked me yesterday if I knew of a way to find out who was watching your console session inside Virtual Center. I wasn’t quite sure what he meant by this at first. But after doing some digging I discovered that yes you can find out who is watching your console session. Don’t forget that security permissions that are setup correctly will eliminate these snoopers from even getting to the console in Virtual Center in the first place.
Introduction
As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.
With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.
Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point. I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools. I will admit, this was a challenge for me to write up. There is so much material and information that I had to really focus on keeping it simple and to the point. Performance problems can span such a wide array of possibilities that there is never typically one easy answer. Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.
There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic. Since this is such a broad topic, I wanted to try and set the stage. The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation. This page will be the launch point for the various categories. Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.
Performance Troubleshooting VMware vSphere – CPU
Performance Troubleshooting VMware vSphere – Memory
Performance Troubleshooting VMware vSphere – Storage
Performance Troubleshooting VMware vSphere – Network
There is one last I/O component that I will not be covering, and that is the human factor. These posts will assume that your installation or upgrade is of sound mind and body. If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.
Acknowledgments/References:
VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg
Performance Monitoring and Analysis – Scott Drummonds
VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja
http://www.vmware.com/support/
http://www.yellow-bricks.com/esxtop/ – Duncan Epping
Introduction
Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures. Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.
What to look for
- Check for physical cpu utilization that is consistently above 80-90%. Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server. Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
- Watch pCPU0 on non ESXi hosts. If pCPU0 is consistently saturated, this will negatively impact performance of the overall system. If you are using third party agents, ensure they are functioning properly. A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS. All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
- Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage). This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
- Watch for virtual machines that are consistently at 80-100% utilization. This is not a typical pattern of a conventional server. Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles. I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines! If something doesn’t look right, it’s worth checking it out.
- Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center. I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor. After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP. If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center
Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level. Before getting into too much detail I wanted to explain Virtual Center statistics logging. There are various levels of logging that can be set for the VC database. Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting. Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed). If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here. To change these settings click, Administration –> vCenter Server Settings –> Statistics.
Let’s take a look at a physical ESX host performance metrics through Virtual Center. vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here is the CPU sample:
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of physical CPU utilization across all processors:
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your PCPU and CCPU (Physical/Console) here. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
A detailed list of ESXTOP counters can be found here:
http://communities.vmware.com/docs/DOC-5240
http://communities.vmware.com/docs/DOC-9279
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. Check your % Processor time which is the current load of the virtual processor.
Monitoring with PowerCLI
Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find virtual machines with CPU ready time? How about virtual machines that have CPU reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
Conclusion
If your using VMware vSphere, there are many different ways to monitor for CPU problems. The Virtual Center database is the first place you should start. Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
Introduction
Hopefully you have read my previous blog posts on pvSCSI. It describes what the driver is, how it works, and how it can positively impact your performance and workloads. Part two covers the process of installing the pvSCSI driver on an existing system and a new system. Both can be found here on the site and you might find them useful:
http://www.virtualinsanity.com/index.php/2009/11/21/more-bang-for-your-buck-with-pvscsi-part-1/
http://www.virtualinsanity.com/index.php/2009/12/01/more-bang-for-your-buck-with-pvscsi-part-2/
Interrupt Coalescing
VMware recently published a KB article that answers a question that has been floating around the community for a while. The pvSCSI driver sounds superior to the LSI driver with direct I/O access to the hypervisor so why not use it in all cases? The article states that you should only use the newer driver when driving higher workloads, those that are typically 2000 IOPS or greater. For those that don’t know 2000 IOPS is a pretty big workload. Consider this, a standard fiber channel 10,000 RPM drive averages around 125 IOPS per disk.
I didn’t really understand this and the knowledge base article is lacking any detail on the rational behind the statement. I reached out to VMware performance engineer Scott Drummonds to see if he had anything he could publish to help clarify the KB article. Scott was nice enough to research this and posted his findings here.
So it appears that the technical explanation is interrupt coalescing or buffering. The paravirtual SCSI driver was designed to handle receiving multiple requests at a high rate and then “batching” the requests together for better efficiencies in throughput. If you aren’t generating high enough workloads on the virtual machine, the I/O request could unnecessarily sit in the queue while the “batch" waits to be filled up for the next transaction. This could cause storage performance problems which would typically be seen as higher latency and would negatively impact the virtual machine.
Now and Then
The great news is the current release of the driver is optimized for heavy workloads. If you are starting to virtualize SQL/Oracle systems and need the performance, go for the pvSCSI driver and get better throughput. If your deploying standard virtual machines that are doing lower workloads, continue to embrace the existing LSI Logic driver.
If you are new to vSphere 4, or have just upgraded from 3.5 and are starting to rebuild your templates to embrace virtual hardware version 7, don’t use the pvSCSI driver as part of your standard template. VMware is working on the driver and will be introducing advanced coalescing functionality. When this is built into the driver stack pvSCSI will then be able to be utilized for all workloads as it will understand when it needs to ramp up for higher workloads.
Thanks again to Scott Drummonds for taking the time out of his busy schedule to track this one down.
I am attempting to pull together a blog post around performance, it’s going to be a four part segment on each I/O component of VMware, CPU, Memory, Storage and Networking. My goal is to try and cover the various tools that you can use to help troubleshoot performance problems that you might experience in your virtual environment.
While I was going through some of the methods, I wanted to illustrate how VMware now includes Windows Performance Counters inside a guest virtual machine to assist with performance monitoring/troubleshooting. I jumped on a test virtual machine I have, and pulled up Windows perfmon. To my dismay the VMware counters are missing! We are currently running VMware vSphere 4.0 update 1 so I checked with a few other people online like Rick Vanover (@RickVanover). It confirmed it seemed to be related to this specific release of vSphere.
I reached out to Scott Drummonds via Twitter (@drummonds), a performance systems engineer who works for VMware, and also opened a service request with support. Scott validated that he saw the same issue and was launching an investigation. Unfortunately the SR didn’t get very far as I was instructed that this was an “experimental feature and was removed from vSphere”. Uhhh ok, I knew that wasn’t right so I waited to hear back from Scott.
Scott has since written a blog post that discusses this issue. It looks like a complete uninstall of the VMware tools on the client followed by a re-install resolves the issue. This does require a reboot for those that are not familiar with this process. The problem appears to be related to mofcomp which it a tool that Microsoft provides and registers WMI information (such as VMware performance counters) with Windows.
Thanks to Scott for jumping on this so quickly and posting a fix to the issue, it’s great to see social media paying off in the real world. Thanks to Rick for helping me figure out what was going on and validating some of my assumptions. Rick has also written up an excellent blog post on this same issue. Hopefully a patch will be rolled into the next minor release of vSphere 4 that will resolve this bug going forward.
Overview
A fellow engineer extraordinaire (Mike Evans) inspired me to write up this blog post. Mike and I have been using the “notes” attribute for virtual machines for a few years. It has come in very handy to track who requested the virtual machine resource, and the date the virtual machine was provisioned. If your not familiar with the notes field, it’s at the bottom of the summary page of a virtual machine properties page.
This little piece of information might seem trivial to the layperson but the larger your virtual environment grows, the more complex it becomes. Having a way to track this fluid, ever changing infrastructure becomes more and more important as your begin to scale up and out.
Attributes
The “Notes” field was great for us except that we began to notice the variations of details that we had entered into each virtual machines properties. Not probably a huge deal if you have a small VMware environment but when you start tracking several hundreds of virtual machines, it really starts having an affect on reporting. A newer feature that was added to Virtual Center was the ability to use attributes, or pre-defined fields that can be populated. This gives a VMware administrator the ability to have a common format for reporting on Virtual Infrastructure. Below is a screen shot of the Custom Attributes you can find in Virtual Center:
Notice there are three different categories I have displayed in this view, Global, Host, and Virtual Machine. You can set attributes at multiple places in your VI environment that you wish to track. You can see we are interested in Virtual Machines attributes for certain variables (Owner, Provision Date, Provisioned By, Purpose). We have a different interest at the host level (Build Date) for maintenance tracking purposes of physical hardware assets.
Reporting
Here is where all your hard work starts to pay off.
It’s audit time, you are tasked with trimming the fat in your environment because once again you are out of capacity, and the budget just got crushed for the rest of the year because “Insert your reason here” the UPS batteries just exploded! Go into Virtual Center and generate a report of your virtual infrastructure so you can get a report of who owns what, and what date it was deployed. Go to your Virtual Machine view, select your datacenter, go to the menu option “Export” and then select “Export List”. Save the export as a Excel Spreadsheet, and view your results. Notice the highlighted columns K through N, these are the custom attributes that we added above.
Conclusion
Virtual Center custom attributes are a great way to help manage your growing environment. Sit down with your team, or your potential customer and find out what values matter most in your environment. Create the custom attributed at the various places in Virtual Center. Make sure you are diligent about filling out the details when you bring up new systems and make it part of your internal process and documentation. You will thank yourself down the road.
Overview
I like to try and save my employer money when possible. I am of the opinion I would be doing them a disservice if I didn’t examine and evaluate a product that we paid for. Our company decided to take the plunge and upgrade all of our licensing to vSphere Enterprise Plus. There is a new backup/data protection product that was introduced with this recent release. Here is the technical definition of VMware Data Recovery from the administration guide:
VMware® Data Recovery creates backups of virtual machines without interrupting their use or the data and services they provide. Data Recovery manages existing backups, removing backups as they become older. It
also supports deduplication to remove redundant data.Data Recovery is built on the VMware vStorage API for Data Protection. It is integrated with VMware vCenter Server, allowing you to centralize the scheduling of backup jobs. Integration with vCenter Server also enables virtual machines to be backed up, even when they are moved using VMware VMotion™ or VMware
Distributed Resource Scheduler (DRS).
Sounds pretty good right? You get a backup application (with de-dup!) that could possibly displace your primary method of backups built specifically for VMware? I did a little digging in the community and was disappointed to learn that vDR is not exactly an enterprise product. A lot of the feedback from other VMware engineers was “it’s a 1.0 product and is designed for a small installations”. The maximum supported virtual machine backup configuration is 100 virtual machines.
I decide to check it out for myself and see if it was a fit for our environment and might possibly alleviate some of our backup problems. Our primary site is rather large, but we are now implementing vSphere at our smaller satellite locations and this might be a fit for a smaller office configuration.
Installation
The installation was quite easy, VMware has provided another great virtual appliance that can be downloaded from their website. After you import the virtual appliance via Virtual center and assign the host a static IP address, you then need to install the VC plug-in so you can manage your newly installed appliance.
After I ran through the installation and configuration I was disappointed to discover that I couldn’t get VDR to launch. I kept getting prompted for authentication credentials which was odd. I thought maybe I had incorrectly set something up so I went back and reviewed the administration guide. Upon closer examination (RTFM) I discovered that vDR doesn’t support Virtual Center running in linked mode. To my dismay, we are running in a VC linked mode in anticipation of a Site Recovery Manager implementation. Our remote sites are managed by our primary site Virtual Center to save costs. I discovered a work around by pointing the VC client to the ESX server that is managing the vDR appliance. This would only give me access to backup other virtual machines hosted on the same ESX host so I could continue my testing. I hope this is something that future versions of the product will address and fix.
The Console
Once you launch the vDR console, you are immediately prompted by a configuration wizard to begin setting up your environment. Here are the following steps in the order they are presented:
- Select your Virtual Machines to backup.
- Select your destination (CIFS share, attached vmdk, or RDM).
- Select your backup window.
- Select your retention policy.
All of these are straight forward and don’t require much discussion. The only step I found a little confusing was the retention policy. Personally I would have preferred something a little more technical than “Few/More/Many”.
The retention Policy radio buttons are pre-defined settings and will change the policy details below. Change the buttons and you can see the variables change and what each setting will mean in terms of your destination data storage. Use caution here as each vDR appliance can only support up to 1TB in data store size, with a maximum of two stores.
The Backup
The underlying backup technology behind vDR is the new vStorage API (Not VCB), it takes advantage of a new feature called change block tracking. After the first full backup is performed, Change block tracking examines the virtual disk being backed up and only backs up the differences from the first backup. This means less backup traffic going across your network.
I selected a CIFS/Windows share at our disaster recovery site to perform the backup testing. The test share was a ~600GB, 5+1 (10K) of locally attached SCSI storage on a HP DL380. I selected a couple of Windows virtual machines to test with and kicked off the backup jobs. Below is a screenshot of the reporting window for vDR (sorry for all the censorship).
The jobs seemed to run pretty slow in general, but completed successfully without errors (the error listed above is because of my linked virtual center configuration). In my opinion the reporting interface is lacking some details. I would have liked to have seen what throughput I was getting during the backups. The only way I could see the throughput was by monitoring the windows host that was housing the data store. I would have liked a more detailed task status, so I could tell what was going on through out the backup operation. Data de-duplication ratio would have been another great detail to see. This could help determine the total backup and estimated completion time of each virtual machine, which is another variable I found to be missing.
The Restore
There are two approaches to restoring data using vDR, the first method is a full system restore. This method will recover the entire virtual machine, system state and all corresponding data. When performing a full system restore you can restore the data to an alternate esx host, data store, and decide if you wish the network interface connected or not. I even found that you can change the virtual disk node, and select an alternate SCSI path to recover your disk path too.
The second option is a file level restore (FLR), which typically most people would tend use on a more regular basis. Unfortunately the vDR console can’t recover individual files without some additional configuration. You need to install “VMwareRestoreClient.exe” executable on a virtual machine, which then will give you the ability to browse your data store contents and select individual files to recover. I anticipate that we will see the FLR components get rolled into the vDR console in a future release.
Conclusion
VMware Data Recovery lacks a lot of critical pieces that an enterprise backup application should and needs to provide. The product is a great start for smaller VMware implementations, but even at that I could see it quickly being outgrown. Here are the areas I would love to see improved on in future releases of the product:
- Need support for linked Virtual Center’s. Personally I could use this product at some of our smaller locations but can’t leverage vDR since we are running in a linked mode.
- Need to support larger capacity of virtual machines. 100 virtual machines is not enough, the product needs to scale to support a larger VMware implementation (not necessarily Enterprise).
- Need support for larger data stores. 1TB is not a lot of space when you are going to be backing multiple virtual machines up and retaining their data for longer periods of time.
- Need support for more data stores per vDR appliance. Again this goes back to scale, storage growth is exponential in our current environment.
- Support for a global vDR manager. I would love to see VMware develop a central master or parent vDR console that would allow you to manage your children appliances, and the data stores that they manage.
- Single console for both full system restores and file level restores.
VMware Data Recovery comes with all versions of VMware vSphere except for vSphere standard. This is a great entry level backup solution with de-duplication included. I am excited to see the product develop into a more mature product that can scale with some bigger environments. I also feel that including vDR in the standard version of vSphere would only help the SMB market embrace virtualization at a higher adoption rate.
With the launch of VMware vSphere came some new products that I hadn’t really paid much attention to (busy upgrading I guess). One of the newer products is a Virtual Center reporting tool called Capacity IQ. This product gives an administrator the ability to analyze, forecast and plan for future growth across your ESX environment. I have had a lot of experience with monitoring/reporting tools in the past, I won’t bore you with the details, so I was quite skeptical of a 1.0 reporting tool for Virtual Center. I must admit I was blown away by the immediate relevant reports the product was able to produce.
After pulling down the trial install and obtaining the demo key, I loaded it up for a spin. I am not going to document the installation steps needed as Eric Gray has done this for us already. It by far is the easiest reporting application I have ever installed. If your interested in taking it for a trial run, download the virtual appliance from VMware’s website here (OVF format). Once you import the virtual appliance and give it a static IP address, it will need to collect data about your environment for a while.
There are three basic views that CIQ gives you once you install the plug-in, dashboard, views and reports.
Dashboard
The dashboard tab is designed to give you a quick overview of the item you have selected. Capacity IQ uses the same approach as virtual center does, whatever object you have selected will be reported and focused on. Here is a view of one of our clusters, notice January 11th on the Trend and Forecast graph on top.
One of our clusters was out of resources, I added two more physical hosts to the cluster. You can see CIQ picks up the new physical host resources for the cluster and reflects this by increasing the number of virtual machines it believes the cluster can accommodate. Want to see something even more interesting, check out the pink graph on the 17th. Capacity IQ is already using a prebuilt formula to assume what it thinks we will have (or won’t have) a week out. Pretty impressive.
Views
The views tab is designed to give you a more detailed look on some of the specific data points. Here is a screenshot of the various reports you can execute:
So here is where you can get some great visual reports to present to either upper management, or a potential customer. This gives you a nice interface that you can customize with data points that you can tweak. Check out the first report on this cluster:
This gives you a graphical historical view of your cluster, how many virtual machines you have added over the course of time. Notice the horizontal sliding bar at the bottom of the chart. This allows you to adjust your variable time/date window. The lighter shaded line to the right is the projected or forecasted growth of how the cluster might continue to grow. The views tab is a great place to run some ad-hoc reports, gives you the ability to select the type of report, and even allows you to export the data.
Reports
The reports tab is the “pre-canned” reports that can be executed by the administrator. The one thing I was disappointed to not see here was the ability to schedule these reports to run at a particular interval (weekly/monthly). This is something that I assume will probably be introduced in future releases of the product.
After the report is executed and compiled, you are then provided with a .pdf or .csv version of your dataset to download and review. The first report totaled 17 pages and provided some great technical information. Here is the table of contents:
Conclusion
I am very impressed with Capacity IQ. There are no agents you need to install across the virtual machines you wish to report against. The installation was very straight forward, I think I had it up and running in about 15 minutes. Once the virtual appliance was in place, all it needed was a little bit of time to start crunching some data. The reports are well written and very relevant to what an administrator would desire and wish to see. If your looking for a nice reporting tool to help you forecast, give this one a test to see if it fits your needs.
Happy New Year! I hope everyone enjoyed the holiday’s and got to spend some time with friends and family. If your reading this I suggest you pay tribute to the quality of Virtual Insanity, and give the gift of voting. Eric Siebert has released a “best of 2009 blog contest”. If Virtual Insanity has helped you out in some way in the past I suggest casting a vote for this great virtualization blog space! Ok onto the real reason for this post…
I ran into an oddity while bringing a new host online today into our vSphere environment. And thought it best to publish my findings. Hopefully this might save someone a support call. With vSphere 4 update 1 came a couple of technical issues, which are detailed here and here. Personally we don’t use ESXi so only the first one was a major issue for us. We are an HP shop, so the issue around the HP agents and update 1 was a major concern (basically would render the host unbootable). Luckily VMware support is proactive about announcing issues like this to the community and most people were aware of the problem right away.
The problem I hit today was strange and I thought it was just being off from work for a week. I went to apply our update 1 baseline to a new host I was bringing up, rescanned, and then got this:
What the? I know this isn’t compliant, our base build is still at 4.0 Check out the build number, that’s proof. I have used the update 1 baseline for 50+ hosts so I know it’s not that. So maybe update manager is still on holiday as well, I restart the service and life is good? Nope. Same thing.
To make a long story longer, I poke around in the repository and check the update 1 patch and see it’s valid, yep 11/19/09 that’s the right release date. Why is this thing not working?
I kept poking and prodding thinking maybe they released an update to the update? Sure enough it slipped by me when I wasn’t looking, or it went to my spam mail. Check the date 12/9/09.
I created a new test baseline, and dropped the 12/9/09 update 1 into it and applied it to my new host. Low and behold:
That’s much better. Strange the older update 1 patch didn’t reflect anything and showed compliant. As an end user I would have liked to have seen some type of error message, or a reference to the newer released update 1. Ran the new update, (still stopped the HP agents just in case). And now things look good again (build number):
Conclusion
Go vote for this site, and make sure you update your update manager, update 1 baseline. That’s a lot of updates. See you online!
Scott
Part 2 Doing the work
As you might have noticed, this blog post is a continuation to my first post about PVSCSI, you can access Part 1 here.
Hopefully now you have a better understanding of what the Paravirtual SCSI driver is all about, and we can prove there are tangible reasons to move in this direction. Let’s get on with the important part, the implementation phase.
(I need to finish off this blog post, I am running out of pictures of SCSI cables)
There are some caveats I need to start out with. In case you missed it, PVSCSI drivers on virtual machines aren’t supported on operating system disks unless you are running vSphere 4 update 1. You can use the driver on a secondary data disk if you so desire, but for this post I am going to assume you are running vSphere 4 update 1 (Virtual Center and ESX Hosts) and want to know how to get the driver working on all disks.
In most cases, it’s always easier to build new. You know you have a clean install, the drivers are updated, the configuration is solid. I would suggest updating your templates to include the new paravirtual scsi driver. Your existing virtual machines run fine with their existing configurations, and depending on your environment, it might be a lot of work to go back and target all of your virtual machines. For an upgrade path, my personal opinion would be to target your heavy I/O virtual machines. Upgrade the VM’s that will make a difference, and you will see some immediate benefits. Reducing the I/O on the disk subsystem will only benefit the other virtual machines that might share those same physical disk spindles.
Clean install
This section will walk you through the process of installing the driver with a Microsoft Windows 2003/2008 operating system. Currently these two operating systems are the only ones supported. Hopefully we will see some added support for the various Linux operating systems down the road.
Walk through the “New virtual machine Wizard” as you normally would. On step 9, ensure you select the “VMware Paravirtual” option as seen below.
Before powering your new VM up, you need to connect the virtual floppy image file that has the driver for your desired guest operating system. This is not on the VMware.com website under downloads, it already exists on your ESX host. You will need to browse to the following location on your ESX host. [Datastores]\vmimages\floppies I would wait to connect your floppy disk image after you boot off the Windows CD-ROM so it doesn’t try to boot off the floppy drive.
When you power up your new virtual machine, select the F6 option to tell the operating system you need to use a third party SCSI driver:
Now connect your floppy disk image to your virtual machine under the “edit settings” option. You should now be able to point to operating system to the driver as seen below:
Continue on with your normal installation, and you are complete. Your new virtual machine is now utilizing the Paravirtual SCSI drivers. I suggest now converting this image you created to a template for future deployments with this configuration.
Upgrading and Existing Virtual Machine
To upgrade an existing virtual machine, the process is pretty straight forward. Assuming you have already upgraded to the latest virtual hardware (Version 7), make sure your VMtools are upgraded post Update 1. Shut down the VM, and edit the settings “Change Type” as shown below:
You will get another window that will alllow you to change the type of controller as seen below:
Select the “VMware Paravirtual” and then select ok. Boot up your virtual machine and you are all set. Your system is now running with the updated drivers and you can take advantage of the newer drivers that provide better throughput and less latency!
Hope you found this post useful. Good luck!
Scott
One of the new features that was added to the release of VMWare vSphere 4.0 was a new SCSI subsystem driver that allows more I/O and less latency per virtual machine. What the heck is PVSCSI? Here is the technical definition stripped right from the vSphere storage guide. (RTFM).
VMware Paravirtualized SCSI (PVSCSI) is a special purpose driver for high-performance storage adapters that offer greater throughput and lower CPU utilization for virtual machines. They are best suited for environments in which guest applications are very I/O intensive. VMware requires that you create a primary adapter for use with a disk that will host the system software (boot disk) and a separate PVSCSI adapter for the disk that will store user data, such as a database. The primary adapter will be the default for the guest operating system on the virtual machine. For example, a virtual machine with Microsoft Windows 2008 guest operating systems, LSI Logic is the default primary adapter. The PVSCSI driver is similar to vmxnet in that it is an enhanced and optimized special purpose driver for VM traffic and works with only certain Guest OS verision that currently include Windows Server 2003, 2008 and RHEl 5. It can also be shared by multiple VMs running on a single ESX, unlike the VMDirectPath I/O which will dedicate a single adaptor to a single VM.”
So what does all that mean for you? Better disk performance and less CPU cycles spent on processing these disk requests. I took some notes at VMWorld 2009 during a few different sessions that discussed PVSCSI. Here is my logical diagram of what PVSCSI is. Download the PDF version here so you can print it out and frame it on your cube wall!
With the release of vSphere 4.0 PVSCSI was only supported on disks other than the operating system (secondary data drives). For more information on this, reference KB Article: 1010398.
vSphere 4 update 1 is now released and it’s exciting news for those looking at utilizing PVSCSI. Support for boot disk devices attached to a Paravirtualized SCSI ( PVSCSI) adapter has been added for Windows 2003 and 2008 guest operating systems.
So let’s first find out if it’s all that. We need to do some testing to validate the hype. I created two virtual machines, one with the traditional LSI Logic SCSI driver, and one with the new PVSCSI driver. The host is the same for each VM, 4 socket Intel Xeon system with 64 GB of RAM, connected to EMC Clariion CX3-80 storage. The Raid configuration is a 4+1 RAID 5 set (10K spindles), with the default Clariion Active/Passive MRU setup (No PPVE). Each VM has 2 vCPU’s and 4 GB of RAM and both are running 32 bit Microsoft Windows 2003 R2. Both Virtual Machines data disks were formatted using diskpart and the tracks were correctly aligned. Anti-virus real time scanning was disabled on both systems. This test is meant to get as close as possible to a standard configuration that we can benchmark from.
I used IOMETER as my testing engine. I didn’t go too deep on the various settings. The first run is 32K 50%R 0%W.
Non-PVSCSI
With-PVSCSI
Quite the difference, no? To be honest, I was seeing a lot of fluctuation while doing my tests. I probably should have segregated things out a little more, but the screen captures were the average of the results. I was thinking maybe I should use the built-in random IOmeter combined results. So here you go.
Non-PVSCSI
With-PVSCSI
I believe the results speak for themselves. I need to do a little more testing for my own personal preferences. I want to get a more insight on what the differences are on the reads/writes and the various sizes. I am certain cache has a lot to do with the results, but I think IOmeter can bypass cache since you force the randomizer. I’m also curious about the sweet spot on the block sizes and how that plays out with read vs write.
Conclusion
PVSCSI is a technology worth moving towards. There is no cost involved, and it can deliver better disk performance across your ESX environment. It also can bring your host CPU utilization down, which can provide you with better consolidation ratios across your clusters. Stay tuned for part 2, when I am going to provide the “how to do it” aspect so you can begin to leverage this technology you are already paying for!
Hope you found this information helpful. Thanks goes out to Aaron Sweemer (@asweemer) for allowing me to abuse his website and not having to deal with bringing up my own site.
Thanks!
Scott Sauer
Going Thin and not looking back.
Yes, I am slowly losing my hair like many other aging men out there, but it wouldn’t be virtual insanity if I were blogging about my personal male pattern baldness issues. With the latest release of VMware vSphere comes a lot of new features and functionality that can be leveraged to make our lives easier. One of these features, that I personally have been looking forward to for a while, is Thin Provisioning. If you aren’t familiar with this technology, jump over to Gestalt IT for a great explanation of what it is and how it works.
One of the exciting promises of thin provisioning, is getting more “bang for your buck” out of the expensive enterprise storage you have been investing in for your ESX environment. But, as Bret Michael’s once said, “Every rose has its thorn” and there are some things to look out for and considerations to make, before implementing thin disk technologies.
Efficiencies are great if they work right and don’t over
complicate the environment.
Do your homework and make sure you understand the characteristics of the virtual machine that you are considering migrating into a thin disk configuration. The last thing you want to do is convert every VM to thin disk, and four months down the road all of your data stores are filling up and you’re scrambling for a storage CAPEX. Some people are of the opinion to do thin provisioning either on the host side (VMware) or on the storage array side, but not both. Take a gander at Chad Sakac’s blog that discusses thin on thin and some thoughts around each of these approaches. I’m not going to go into all of the pluses and minuses of thin provisioning but rather focus on how to make it work for you.
Coffee Talk
So now that we have some of the basics out of the way, I wanted to share my thoughts on thin provisioning. Like many organizations, we get requests from our customers that err on the side of caution. They want to plan for the worse case and ensure that their project and/or application isn’t setup for failure. I don’t blame them really, I do it myself all the time when I make coffee at home. I always end up making more coffee than I typically drink, just in case I might need that extra charge. The best way to do that is pad it, request more than what you might really need, just in case something comes up down the road. Virtual machine disk storage in some cases fits this same profile. If my coffee maker granted me access to hot coffee on demand, I would stop making extra coffee. Thin disks can give your end users that capacity on demand so you can gain control of the padding effect that typically takes place in most corporate organizations.
Take it back…
So now you have done your research, you’re starting to get a feel for what this thin stuff is and how it might play out in your shop. It’s go time. If you’re a smaller VMware customer, you probably already have an idea of what are good target disks to convert. If you’re a larger environment, it might be a little more difficult to gauge where the bloated pigs are hiding.
I worked at GE for a couple of years and was exposed to some of the Six Sigma methodologies they preach as well as practice. Sounds boring, right? Not really. You can really leverage DMAIC for a lot of IT related problems/issues/projects. You don’t have to take it to the extreme, use the framework to help guide you on your quest:
DMAIC
The DMAIC project methodology has five phases:
- Define high-level project goals and the current process.
- Measure key aspects of the current process and collect relevant data.
- Analyze the data to verify cause-and-effect relationships. Determine what the relationships are and attempt to ensure that all factors have been considered.
- Improve or optimize the process based upon data analysis using techniques like Design of experiments.
- Control to ensure that any deviations from target are corrected before they result in defects. Set up pilot runs to establish process capability, move on to production, set up control mechanisms and continuously monitor the process.
We have already defined our project goals and what we are trying to accomplish. We need a good “Measure” tool to really find where we might benefit from thin provisioning. Powershell is a great tool that most VMware administrators use, or have at least heard of. So this was the first place I turned to for assistance.
Alan Renouf of “Virtu-AL” http://www.virtu-al.net/ gave me a hand in writing the powershell script needed. (Thanks again, Alan!). Alan already had a one liner script to produce a list of vm’s, their disks assigned, and how much data each disk was consuming. I needed the ability to see this data outside a powershell window and be able to analyze it in a better format. We have a decent-sized VMware environment and exporting this out to a .csv for analysis is extremely helpful. Here is the script!
************************************************************************
# Set the Filename for the exported data
$Filename = “C:\VMDisks.csv”
Connect-VIServer MYVIServer
$AllVMs = Get-View -ViewType VirtualMachine
$SortedVMs = $AllVMs | Select *, @{N=”NumDisks”;E={@($_.Guest.Disk.Length)}} | Sort NumDisks -Descending
$VMDisks = @()
ForEach ($VM in $SortedVMs){
$Details = New-object PSObject
$Details | Add-Member -Name Name -Value $VM.name -Membertype NoteProperty
$DiskNum = 0
Foreach ($disk in $VM.Guest.Disk){
$Details | Add-Member -Name “Disk$($DiskNum)path” -MemberType NoteProperty -Value $Disk.DiskPath
$Details | Add-Member -Name “Disk$($DiskNum)Capacity(MB)” -MemberType NoteProperty -Value ([math]::Round($disk.Capacity/ 1MB))
$Details | Add-Member -Name “Disk$($DiskNum)FreeSpace(MB)” -MemberType NoteProperty -Value ([math]::Round($disk.FreeSpace / 1MB))
$DiskNum++
}
$VMDisks += $Details
Remove-Variable Details
}
$VMDisks | Export-Csv -NoTypeInformation $Filename
***********************************************************************
So now that you have this great spreadsheet, you can do all sorts of crazy sorting and reporting, within Excel. Take some time on phase 3, “Analyze” what you’re seeing. Talk to your VM stakeholders to see how things might be changing from their perspective. Try to plan for the surprises and position yourself accordingly.
Next is the “Improve” phase of DMAIC (see it’s easy!). This is the part where you actually do the work. It’s time to start leveraging the storage VMotion API’s, and reclaim some of that unused disk.
- Select the target VM in the VC client.
- Right click on the VM and select the option “Migrate”.
- Select the option “Change Datastore”.
- Select the destination, or click advanced if you are targeting one particular disk.
- Select “Thin provisioned format”.
- Select Finish.
Rinse and Repeat for the rest of that spreadsheet you have worked so hard on.
The last phase of DMAIC is “Control”. This is one of the most important pieces to thin provisioning in my opinion. At the minimum you need to setup Virtual Center alerts to monitor when your datastores are approaching critical levels. You can’t implement thin disks in your vSphere environment and walk away. The smart people over at VMware have given us the ability to monitor datastore disk space usage and over-allocation with the latest release of Virtual Center. Setup your monitors so you are e-mailed when some of these thin disks begin to grow and you need to take some action.
Eric Gray of VMware takes this to the next level, check out his blog post on utilizing powershell to prevent datastore emergencies. My personal approach to this concept is to setup a “hotspare” datastore for your environment. A good practice to implement here would be to try reclaiming enough storage from your migrations to thin disks to free-up a “hot spare datastore”. Implementing an automated recovery solution like Eric’s will help you sleep easier at night. Worried about what might happen if your script doesn’t work or you do hit the perfect storm and end up with a full VMFS volume? Intelligence has been built into vSphere to automatically pause the virtual machines, impressive. Check out Eric’s video:
Wrapping it all up
Thin disk provisioning is a great feature that you should consider leveraging in your environment. With some forward thinking and best practices you can achieve higher ROI for your ESX storage. VMware vSphere offers the ability for you to migrate from thick to think with no downtime, so you can begin reclaiming storage on the fly. Keep it simple, start out with a high level analysis of your infrastructure. Identify the candidates that are a good fit and worth focusing on. Setup your alerts on the datastores as soon as you migrate your first virtual machine so you are protecting yourself from problems down the road. Consider taking automated actions if your datastores are reaching critical thresholds.
I hope you found this article helpful, good luck!
Scott Sauer
