Introduction
There has been a lot of activity here at VMware with acquisitions and partnerships over the past few months. A fellow engineer at VMware summarized a lot of these acquisitions and how they are meaningful to VMware as an organization. I wanted to share this information because I think it provides people with a better understanding of where we are going as a company and the overall strategic vision of VMware (Thanks again Andy!).
There are plenty of announcements to talk about from the second half of Day 1 and Day 2 of EMCWorld. Keynotes from Pat Gelsinger, Brian Gallagher,and Rich Napolitano all brought the “Why, What, and How” the datacenter will change over the next year. With the quick reference to VPLEX by Joe Tucci in the morning Keynote of DAY 1, the official announcement and demo of the technology by Pat Gelsinger in the afternoon keynote explained the VPLEX technology and its purpose in todays datacenters.
– vTrooper Report — from EMC World
My self-imposed gag order has been lifted since my arrival to EMC.
I’m ready to share some of the great news from EMC World 2010. Stay tuned for all the announcements in detail but for now:
It’s my first time at EMC World. I’m an infrastructure guy at heart and have been to Cisco World and VM World a few times, but not the big storage show. That’s good because EMC is not just a storage company. My “Firehose” treatment over the past few weeks made certain of that fact. So I’ll warm up to EMC World with a little mood and follow it with content. All good fun starts with a party:
Introduction
Being only three weeks into my career at VMware, I haven’t had much time to do any technical blog posts due to the fact that I have been drinking from the fire hose and trying to ramp up as quickly as possible. I am writing this post from VMware’s annual tech summit, and I joked with a few people here that I am so new that I haven’t even gotten a paycheck yet. My wife called to reassure me that I actually did get paid, so no more jokes about a “virtual paycheck” I guess.
Everything has been great so far. The people I am working with are awesome, the job is going to be fun but challenging, and to be honest I think I am working for the coolest software company in the world. VMware’s breadth of products is really quite amazing, they are literally covering the stack with many different applications, and driving a change in the industry that many people are excited about, revolutionizing IT.
A few weeks ago I wrote a post with a very similar title, “Easy VMware Development with VI Java API and Groovy.” Today I want to expand on that a little bit and show you a cool way to quickly stand up web apps for VMware vSphere using Grails. What is Grails? If you’re familiar with the popular Ruby on Rails web application framework, then you can think of Grails as the Java (well, Groovy actually) equivalent of Rails. From the Grails official website …
Grails is an advanced and innovative open source web application platform that delivers new levels of developer productivity by applying principles like Convention over Configuration. Grails helps development teams embrace agile methodologies, deliver quality applications in reduced amounts of time, and focus on what really matters: creating high quality, easy to use applications that delight users.
What does all this mean? The short and sweet answer is Grails will take care of all the pain-in-the-a$$ “stuff” required to get a web app up and running. A good analogy would be cake mix.
Have you ever wanted to write a script or an application that automates your VMware VI3.x / vSphere environment, but lack the development skills to do so? Or, maybe you have development skills, but you’re looking for ways to simplify your code and improve your productivity? In either case, I’ve stumbled across something you’ll definitely want to check out.
Before we start, I should probably clarify something. If you have zero development experience, then the title of this post could be a little misleading. An absolute beginner probably wouldn’t consider this “easy.” There are certainly easier ways to develop VMware scripts which are targeted at VMware Administrators, such as the vSphere PowerCLI. And if you want to do some VMware scripting without learning a programming language and/or acquiring some development skills, then you should stop reading now and go check out the vSphere PowerCLI. However, if you’re a little adventurous and want a “fast track” for creating VMware applications, then by all means, read on.
Introduction
“Going Green” has been a buzz word in the IT community for years but the more I deal with this topic the more I consider it a black and white issue. I never thought I would be covering an energy blog topic, but there are some real world examples I wanted to write about. Datacenters are enormous consumers of energy from the IT infrastructure itself, all the way down to the HVAC that is needed to cool these power thirsty systems. While I think green initiatives are much needed in our industry, typically large corporations don’t consider these initiatives unless there is some intrinsic value associated with them i.e. money. Business drivers outweigh the political pressures of saving the environment, and in all fairness isn’t that what a company should be about, their own salvation?
Maybe that sounds harsh to all of the eco-friendly readers out there, but don’t get me wrong I am all about saving natural resources and respecting our environment. Understanding the underlying issue of the current state of our industry is critical if one is going to offer solutions to a problem. If corporations can save operational costs on power and cooling and say they are a “green company” then we have just killed those two birds with one stone.
Introduction
Networking is the fourth I/O component that I will be covering in this series of performance write ups. Networking is another important component in the stack, if not well thought out, can lead to performance problems later down the road. Security is an important design consideration when planning your network configuration. One might argue that with a virtual environment your are more prone to risks since at times there is no longer a physical cabling restriction in place. If someone has the appropriate rights in virtual center, they could bridge two logical networks together, or place a virtual machine into a DMZ. VMware introduced vShields to mitigate your virtualized environment from some of these risks. By creating zones you can enforce policies that can bridge, firewall, or isolate virtual machines between network segments. When designing or upgrading your VMware environment, work closely with your network team to understand their design considerations. If possible, leverage VLAN tagging (802.1q) to eliminate excessive physical cabling to different segments.
Introduction
Personally one of the most interesting components of the VMware architecture I/O stack is storage. There are a plethora of diverse storage solutions in the industry today that offer unique different ways of addressing storage performance, as well as the increase in capacity demands. Storage problems are the most common mis-configuration effecting performance that exists in VMware today. An oversaturated LUN will effect all virtual machines that share that same data store. Take this concept up a level, a group of disks (RAID group) that are saturated with I/O will negatively impact all LUNS that share those same physical spindles. Storage traditionally has been the “red headed step child” in VMware and hasn’t gotten a lot of visibility. Storage I/O bottlenecks can create serious virtual machine problems and yet it wasn’t until ESX 3.x that graphic visibility was even displayed to VMware administrators, see 2.x MUI reflects CPU and memory (Management User Interface for those newer to VMware).
Introduction
I attended last months Cincinnati VMUG (VMware User Group) and was surprised to hear the responses from the audience on how many customers had not taken the plunge, and upgraded to vSphere yet. I think there were a handful of users that had just completed the upgrade. Sometimes I forget to step out of my own personal space and consider what others have going on in their own environments. If your still wondering about the upgrade, Aaron has a post on some of the benefits of going from VI3 to vSphere.
Part of the process of upgrading your existing investment is the need to upgrade all of the virtual machines to the latest and greatest virtual machine hardware version 7. Someone mentioned to me how much of a pain this was since you had to touch each virtual machine, and my response to them was “It only takes a couple of minutes”. I wanted to prove this theory in a different way, so I mulled over it and came up with a timed video clip. The song I chose is 2 minutes and 39 seconds, so I figured If I can knock this out within the amount of time it takes for the song to play, well then, mission accomplished. Read the rest of this entry »
Introduction
A co-worker asked me yesterday if I knew of a way to find out who was watching your console session inside Virtual Center. I wasn’t quite sure what he meant by this at first. But after doing some digging I discovered that yes you can find out who is watching your console session. Don’t forget that security permissions that are setup correctly will eliminate these snoopers from even getting to the console in Virtual Center in the first place.
Introduction
As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.
With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.
Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point. I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools. I will admit, this was a challenge for me to write up. There is so much material and information that I had to really focus on keeping it simple and to the point. Performance problems can span such a wide array of possibilities that there is never typically one easy answer. Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.
There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic. Since this is such a broad topic, I wanted to try and set the stage. The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation. This page will be the launch point for the various categories. Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.
Performance Troubleshooting VMware vSphere – CPU
Performance Troubleshooting VMware vSphere – Memory
Performance Troubleshooting VMware vSphere – Storage
Performance Troubleshooting VMware vSphere – Network
There is one last I/O component that I will not be covering, and that is the human factor. These posts will assume that your installation or upgrade is of sound mind and body. If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.
Acknowledgments/References:
VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg
Performance Monitoring and Analysis – Scott Drummonds
VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja
http://www.vmware.com/support/
http://www.yellow-bricks.com/esxtop/ – Duncan Epping
Introduction
Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures. Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.
What to look for
- Check for physical cpu utilization that is consistently above 80-90%. Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server. Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
- Watch pCPU0 on non ESXi hosts. If pCPU0 is consistently saturated, this will negatively impact performance of the overall system. If you are using third party agents, ensure they are functioning properly. A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS. All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
- Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage). This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
- Watch for virtual machines that are consistently at 80-100% utilization. This is not a typical pattern of a conventional server. Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles. I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines! If something doesn’t look right, it’s worth checking it out.
- Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center. I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor. After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP. If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center
Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level. Before getting into too much detail I wanted to explain Virtual Center statistics logging. There are various levels of logging that can be set for the VC database. Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting. Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed). If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here. To change these settings click, Administration –> vCenter Server Settings –> Statistics.
Let’s take a look at a physical ESX host performance metrics through Virtual Center. vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here is the CPU sample:
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of physical CPU utilization across all processors:
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your PCPU and CCPU (Physical/Console) here. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
A detailed list of ESXTOP counters can be found here:
http://communities.vmware.com/docs/DOC-5240
http://communities.vmware.com/docs/DOC-9279
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. Check your % Processor time which is the current load of the virtual processor.
Monitoring with PowerCLI
Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find virtual machines with CPU ready time? How about virtual machines that have CPU reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
Conclusion
If your using VMware vSphere, there are many different ways to monitor for CPU problems. The Virtual Center database is the first place you should start. Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
I’ve got to tell you, I’m pretty darn excited right now. Why? I’m typing this to you from 30,000 feet on a Delta flight from Cincinnati to Las Vegas (for VMware Partner Exchange). And why is that so special? Because, as the title suggests, I’m typing this on my VDI image which resides hundreds of miles away and thousands of feet below me.
Delta has a fairly new service from gogo called “gogo inflight … wi-fi with wings.” This is my first time using the service because the past few flights I’ve taken, I’ve either not had the need to connect or the aircraft I happened to be on did not yet have the service. But this time I have some work to do (i.e. my next “confessions” article for VSM), so I figured I’d give it a whirl. And, being a gluten for punishment, I decided to see if I could push the limits of PCoIP. After a quick sign up form (gogo isn’t free) and firing off a VPN connection back to my home office, I launched the View client and crossed my fingers.
And I can tell you that I am thoroughly impressed! The Windows are snappy, flash is decent and low-end multimedia is adequate. I was watching a youtube.com video with full sound and, while the picture was a little blurry and sound/video sync was slightly off, it was totally watchable. And furthermore, it didn’t cripple my session. Not bad, considering my latency is between 150ms and 250ms, with an estimated average about 200ms.
Is this a glimpse of things to come? Right now it may seem pretty far fetched. After all, the process to connect to my desktop image was fairly painful. I had to …
- Boot into my local OS
- Connect to the gogo inflight wireless access point
- Launch my Firefox browser and walk through the gogo signup form
- Dig trough my briefcase for my wallet and pay for the service
- Fire off my OpenVPN client to my home VPN server
- Launch the VMware View Client
Not exactly what I’d call a seamless user experience. And I believe that conquering this experience – that is, the mobile user – will be the coup de grace for traditional desktop infrastructure. Until then, virtual desktop infrastructure will certainly happen in pockets, but massive, wide scale adoption will continue to elude us. So what has to happen here? In my mind, I see the following things need to happen …
True ubiquity of wireless Internet
This means two things. First, the Internet has to be everywhere at all times. I’m a true mobile user and I need to know that no matter where I am – whether it be on a puddle jumper, or in a remote country hotel – that when I power on my laptop, I will have access.
And second, this also means the connection to the Internet has to be completely integrated and transparent. I don’t want to have to dig for my credit card every time. But even more than that, I want the connection to happen for me automatically, in the background, as part of the boot processes. My software client should auto detect the available wireless networks, connect, and debit my account. Will I have a single unified account that works across all providers? Or will I have multiple accounts that my software client will handle? Or will it be a single, wireless / satellite provider that can reach me anytime, anywhere? I don’t know and I don’t really care. The point is, I don’t want to deal with it. I want to press power and, after a short boot (maybe even zero boot?), have access. Period.
A purpose built Thin OS
Booting into a local OS just to launch a client and connect to a remote OS just isn’t going to cut it. The boot process needs to be fast and do nothing more than present me with a login GUI. If I’m remote, the VPN connection (and any necessary login parameters) need to be part of the login process. There’s no need for a full blown local OS if our goal is to do little more than connect to our primary desktop environment. Sure, us hardcore tech weenies will almost always want some sort of backdoor access to the local OS. But for 99% of the users out there, they don’t care and just want a seamless desktop experience. In fact, if done correctly, they shouldn’t even know there is a local OS and their desktop is actually running in a remote datacenter.
Does this actually exist yet? Sort of. ThinClients typically deliver this kind of user experience. But for the most part, ThinClients aren’t mobile devices. I’ve seen a ThinClient laptop model before, but I don’t know a single person actually using one. I’ve actually seen for more cases of customers converting PCs and laptops to ThinClients. Theron Conrey gives us a great example with his blog post VMware View Linux Live CD How-to. And there are enterprise solutions for converting PCs to ThinClients from both Wyse and DevonIT. So, we’re pretty darn close on this front, but still not 100%.
A rich user experience in low bandwidth, high latency environments
Like I stated earlier, my current PCoIP experience is pretty darn impressive. It is, by far, the best experience I’ve witnessed to a remote desktop. But, I’m not sure the average in-flight user would be ecstatic about it. Sure, all things considered, you can’t beat it. But I recognize all the variables working against me right now. The typical user will not know or even care. They just want it to work. The good news is that PCoIP will continue to improve and brings the promise of delivering a rich user experience, whether at 30k feet of a single switch port away.
So, I ask again, is this a true glimpse of the not-too-distant future? Ten years ago, I was the only one of my friends and family to have a cell phone. Five years ago, mainstream virtualization in the datacenter was laughed at. And a few short months ago, typing this blog post on my VMware View image was impossible. So, you tell me.