Archive for February, 2010
A co-worker asked me yesterday if I knew of a way to find out who was watching your console session inside Virtual Center. I wasn’t quite sure what he meant by this at first. But after doing some digging I discovered that yes you can find out who is watching your console session. Don’t forget that security permissions that are setup correctly will eliminate these snoopers from even getting to the console in Virtual Center in the first place.
As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.
With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.
Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point. I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools. I will admit, this was a challenge for me to write up. There is so much material and information that I had to really focus on keeping it simple and to the point. Performance problems can span such a wide array of possibilities that there is never typically one easy answer. Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.
There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic. Since this is such a broad topic, I wanted to try and set the stage. The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation. This page will be the launch point for the various categories. Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.
There is one last I/O component that I will not be covering, and that is the human factor. These posts will assume that your installation or upgrade is of sound mind and body. If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.
VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg
Performance Monitoring and Analysis – Scott Drummonds
VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja
http://www.yellow-bricks.com/esxtop/ – Duncan Epping
Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures. Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.
What to look for
- Check for physical cpu utilization that is consistently above 80-90%. Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server. Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
- Watch pCPU0 on non ESXi hosts. If pCPU0 is consistently saturated, this will negatively impact performance of the overall system. If you are using third party agents, ensure they are functioning properly. A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS. All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
- Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage). This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
- Watch for virtual machines that are consistently at 80-100% utilization. This is not a typical pattern of a conventional server. Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles. I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines! If something doesn’t look right, it’s worth checking it out.
- Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center. I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor. After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP. If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center
Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level. Before getting into too much detail I wanted to explain Virtual Center statistics logging. There are various levels of logging that can be set for the VC database. Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting. Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed). If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here. To change these settings click, Administration –> vCenter Server Settings –> Statistics.
Let’s take a look at a physical ESX host performance metrics through Virtual Center. vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here is the CPU sample:
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of physical CPU utilization across all processors:
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your PCPU and CCPU (Physical/Console) here. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. Check your % Processor time which is the current load of the virtual processor.
Monitoring with PowerCLI
Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find virtual machines with CPU ready time? How about virtual machines that have CPU reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
If your using VMware vSphere, there are many different ways to monitor for CPU problems. The Virtual Center database is the first place you should start. Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
I’ve got to tell you, I’m pretty darn excited right now. Why? I’m typing this to you from 30,000 feet on a Delta flight from Cincinnati to Las Vegas (for VMware Partner Exchange). And why is that so special? Because, as the title suggests, I’m typing this on my VDI image which resides hundreds of miles away and thousands of feet below me.
Delta has a fairly new service from gogo called “gogo inflight … wi-fi with wings.” This is my first time using the service because the past few flights I’ve taken, I’ve either not had the need to connect or the aircraft I happened to be on did not yet have the service. But this time I have some work to do (i.e. my next “confessions” article for VSM), so I figured I’d give it a whirl. And, being a gluten for punishment, I decided to see if I could push the limits of PCoIP. After a quick sign up form (gogo isn’t free) and firing off a VPN connection back to my home office, I launched the View client and crossed my fingers.
And I can tell you that I am thoroughly impressed! The Windows are snappy, flash is decent and low-end multimedia is adequate. I was watching a youtube.com video with full sound and, while the picture was a little blurry and sound/video sync was slightly off, it was totally watchable. And furthermore, it didn’t cripple my session. Not bad, considering my latency is between 150ms and 250ms, with an estimated average about 200ms.
Is this a glimpse of things to come? Right now it may seem pretty far fetched. After all, the process to connect to my desktop image was fairly painful. I had to …
- Boot into my local OS
- Connect to the gogo inflight wireless access point
- Launch my Firefox browser and walk through the gogo signup form
- Dig trough my briefcase for my wallet and pay for the service
- Fire off my OpenVPN client to my home VPN server
- Launch the VMware View Client
Not exactly what I’d call a seamless user experience. And I believe that conquering this experience – that is, the mobile user – will be the coup de grace for traditional desktop infrastructure. Until then, virtual desktop infrastructure will certainly happen in pockets, but massive, wide scale adoption will continue to elude us. So what has to happen here? In my mind, I see the following things need to happen …
True ubiquity of wireless Internet
This means two things. First, the Internet has to be everywhere at all times. I’m a true mobile user and I need to know that no matter where I am – whether it be on a puddle jumper, or in a remote country hotel – that when I power on my laptop, I will have access.
And second, this also means the connection to the Internet has to be completely integrated and transparent. I don’t want to have to dig for my credit card every time. But even more than that, I want the connection to happen for me automatically, in the background, as part of the boot processes. My software client should auto detect the available wireless networks, connect, and debit my account. Will I have a single unified account that works across all providers? Or will I have multiple accounts that my software client will handle? Or will it be a single, wireless / satellite provider that can reach me anytime, anywhere? I don’t know and I don’t really care. The point is, I don’t want to deal with it. I want to press power and, after a short boot (maybe even zero boot?), have access. Period.
A purpose built Thin OS
Booting into a local OS just to launch a client and connect to a remote OS just isn’t going to cut it. The boot process needs to be fast and do nothing more than present me with a login GUI. If I’m remote, the VPN connection (and any necessary login parameters) need to be part of the login process. There’s no need for a full blown local OS if our goal is to do little more than connect to our primary desktop environment. Sure, us hardcore tech weenies will almost always want some sort of backdoor access to the local OS. But for 99% of the users out there, they don’t care and just want a seamless desktop experience. In fact, if done correctly, they shouldn’t even know there is a local OS and their desktop is actually running in a remote datacenter.
Does this actually exist yet? Sort of. ThinClients typically deliver this kind of user experience. But for the most part, ThinClients aren’t mobile devices. I’ve seen a ThinClient laptop model before, but I don’t know a single person actually using one. I’ve actually seen for more cases of customers converting PCs and laptops to ThinClients. Theron Conrey gives us a great example with his blog post VMware View Linux Live CD How-to. And there are enterprise solutions for converting PCs to ThinClients from both Wyse and DevonIT. So, we’re pretty darn close on this front, but still not 100%.
A rich user experience in low bandwidth, high latency environments
Like I stated earlier, my current PCoIP experience is pretty darn impressive. It is, by far, the best experience I’ve witnessed to a remote desktop. But, I’m not sure the average in-flight user would be ecstatic about it. Sure, all things considered, you can’t beat it. But I recognize all the variables working against me right now. The typical user will not know or even care. They just want it to work. The good news is that PCoIP will continue to improve and brings the promise of delivering a rich user experience, whether at 30k feet of a single switch port away.
So, I ask again, is this a true glimpse of the not-too-distant future? Ten years ago, I was the only one of my friends and family to have a cell phone. Five years ago, mainstream virtualization in the datacenter was laughed at. And a few short months ago, typing this blog post on my VMware View image was impossible. So, you tell me.
Hopefully you have read my previous blog posts on pvSCSI. It describes what the driver is, how it works, and how it can positively impact your performance and workloads. Part two covers the process of installing the pvSCSI driver on an existing system and a new system. Both can be found here on the site and you might find them useful:
VMware recently published a KB article that answers a question that has been floating around the community for a while. The pvSCSI driver sounds superior to the LSI driver with direct I/O access to the hypervisor so why not use it in all cases? The article states that you should only use the newer driver when driving higher workloads, those that are typically 2000 IOPS or greater. For those that don’t know 2000 IOPS is a pretty big workload. Consider this, a standard fiber channel 10,000 RPM drive averages around 125 IOPS per disk.
I didn’t really understand this and the knowledge base article is lacking any detail on the rational behind the statement. I reached out to VMware performance engineer Scott Drummonds to see if he had anything he could publish to help clarify the KB article. Scott was nice enough to research this and posted his findings here.
So it appears that the technical explanation is interrupt coalescing or buffering. The paravirtual SCSI driver was designed to handle receiving multiple requests at a high rate and then “batching” the requests together for better efficiencies in throughput. If you aren’t generating high enough workloads on the virtual machine, the I/O request could unnecessarily sit in the queue while the “batch" waits to be filled up for the next transaction. This could cause storage performance problems which would typically be seen as higher latency and would negatively impact the virtual machine.
Now and Then
The great news is the current release of the driver is optimized for heavy workloads. If you are starting to virtualize SQL/Oracle systems and need the performance, go for the pvSCSI driver and get better throughput. If your deploying standard virtual machines that are doing lower workloads, continue to embrace the existing LSI Logic driver.
If you are new to vSphere 4, or have just upgraded from 3.5 and are starting to rebuild your templates to embrace virtual hardware version 7, don’t use the pvSCSI driver as part of your standard template. VMware is working on the driver and will be introducing advanced coalescing functionality. When this is built into the driver stack pvSCSI will then be able to be utilized for all workloads as it will understand when it needs to ramp up for higher workloads.
Thanks again to Scott Drummonds for taking the time out of his busy schedule to track this one down.