Posts tagged Performance
Performance Troubleshooting VMware vSphere – Memory
Feb 19th
Introduction
As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.
With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.
Memory Technologies in VMware vSphere
There are some major benefits of virtualization when it comes to memory. VMware implements some sophisticated and unique ways of maximizing physical memory workloads within an ESX host. All of these features work out of the box with no advanced configuration necessary. To understand problems that might occur in your environment you need to be familiar with these basic memory concepts.
- Transparent Page Sharing – The VMkernel will compare physical memory pages to find duplicates, then free up this redundant space and replaces it with a pointer. If multiple operating systems are running on one physical host, why should you load the same files multiple times? Think of this as the data de-duplication process we are seeing in a majority of backup solutions in the industry.
- Memory Overcommitment – The act of assigning more memory to powered on virtual machines than the physical server has available. This allows for virtual machines that have heavier memory demands to utilize the memory that is not actively being used on under utilized machines.
- Memory Overhead - Once a virtual machine is powered on the ESX host reserves memory for the the normal operations of VMware infrastructure. This memory can’t be used for swapping or ballooning, and is reserved for the system.
- Memory Balloon Driver – When VMware tools are installed on a virtual machine they provide device drivers into the host virtualization layer, from within the guest operating system. Part of this package that is installed is the balloon driver or “vmmemctl” which can be observed inside the guest. The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it’s no longer valuable to the operating system. If the Physical ESX server begins to run low on memory it will grow the balloon driver to reclaim memory from the guest. This process reduces the chance that the physical ESX host will begin to swap, which you will cause performance degradation. Here is an illustration if ballooning in ESX:
What to look for
- Check ESX host swapping. If you are overcommitting memory on the physical ESX host you can run into a situation when each virtual machine is in need of the total amount of what is granted. When the host is out of memory it will begin to page out. Keep an eye on your oversubscription rates of physical hosts, or ensure you have enough memory resources across your DRS clusters so it can balance the load more effectively. Swapping will occur when the following formula is met:
Total_active_memory > (Memory_Capacity – Memory_Overhead) + Total_balloonable_memory + Page_sharing_savings
- Check for Virtual machine swapping. Make sure you virtual machines have enough memory for the application workload that they are supporting. If virtual machine swapping starts to occur this can put a strain on the disk subsystem.
- Check to ensure VMware tools are installed and updated. VMware tools not only provides drivers from the guest to the hypervisor, but the balloon driver also gets installed with VMware tools. For proper memory management the ESX host relies on the balloon driver to manage memory.
- Check memory reservation settings. By default VMware ESX dynamically tries to reclaim memory when not needed. There are situations when you might choose to utilize memory reservations. If you set memory reservations in your environment be aware that this memory is permanently assigned to the host and can not be reallocated when it’s not being used. Don’t sell the balloon driver short, many third part application vendors over spec their configurations for personal safety, and ballooning can help counteract some of that wasted “fluff factor”.
Monitoring with Virtual Center
The first place I would start with checking memory configurations is Virtual Center. Virtual Center provides excellent reporting and gives you granular control over which metrics you would like to report against. VMware vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here are some memory samples:
Check your over all % usage (lower is better)
Check your Ballooning (lower is better)
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of memory utilization with many of the variables we just discussed above, great snapshot of what’s going on (looks healthy below):
Check your various metrics, mainly for swapping activity
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked and forgotten about. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup Memory alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host Memory usage is above 90% for 5 minutes or greater. A lot of these alerts are built into Virtual Center so you don’t have to do a lot of pre-configuration work. You do need to make sure you setup the e-mail notifications under the “Actions Tab”.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you need to monitor memory select the “m” key. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your total Physical memory here, make sure you aren’t over committing and causing swapping. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. You can monitor the same metrics found in Virtual Center and esxtop here. Just another way of getting at the data especially if you have a background in Microsoft Windows and are familiar with perfmon.
Monitoring with PowerCLI
Another great place to go to for finding potential memory problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find out what your ESX host service console memory is set to? How about virtual machines that have memory reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
Conclusion
If your using VMware vSphere, there are many different ways to monitor for memory problems. The Virtual Center database is the first place you should start. Check your physical host memory conditions, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check some of the key metrics that we discussed above.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
Performance Troubleshooting VMware vSphere – The Tetralogy
Feb 15th
Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point. I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools. I will admit, this was a challenge for me to write up. There is so much material and information that I had to really focus on keeping it simple and to the point. Performance problems can span such a wide array of possibilities that there is never typically one easy answer. Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.
There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic. Since this is such a broad topic, I wanted to try and set the stage. The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation. This page will be the launch point for the various categories. Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.
Performance Troubleshooting VMware vSphere – CPU
Performance Troubleshooting VMware vSphere – Memory
Performance Troubleshooting VMware vSphere – Storage
Performance Troubleshooting VMware vSphere – Network
There is one last I/O component that I will not be covering, and that is the human factor. These posts will assume that your installation or upgrade is of sound mind and body. If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.
Acknowledgments/References:
VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg
Performance Monitoring and Analysis – Scott Drummonds
VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja
http://www.vmware.com/support/
http://www.yellow-bricks.com/esxtop/ – Duncan Epping
Performance Troubleshooting VMware vSphere – CPU
Feb 15th
Introduction
Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures. Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.
What to look for
- Check for physical cpu utilization that is consistently above 80-90%. Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server. Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
- Watch pCPU0 on non ESXi hosts. If pCPU0 is consistently saturated, this will negatively impact performance of the overall system. If you are using third party agents, ensure they are functioning properly. A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS. All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
- Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage). This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
- Watch for virtual machines that are consistently at 80-100% utilization. This is not a typical pattern of a conventional server. Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles. I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines! If something doesn’t look right, it’s worth checking it out.
- Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center. I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor. After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP. If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center
Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level. Before getting into too much detail I wanted to explain Virtual Center statistics logging. There are various levels of logging that can be set for the VC database. Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting. Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed). If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here. To change these settings click, Administration –> vCenter Server Settings –> Statistics.
Let’s take a look at a physical ESX host performance metrics through Virtual Center. vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here is the CPU sample:
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of physical CPU utilization across all processors:
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your PCPU and CCPU (Physical/Console) here. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
A detailed list of ESXTOP counters can be found here:
http://communities.vmware.com/docs/DOC-5240
http://communities.vmware.com/docs/DOC-9279
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. Check your % Processor time which is the current load of the virtual processor.
Monitoring with PowerCLI
Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find virtual machines with CPU ready time? How about virtual machines that have CPU reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
Conclusion
If your using VMware vSphere, there are many different ways to monitor for CPU problems. The Virtual Center database is the first place you should start. Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
VMware pvSCSI – When and when not to use it
Feb 5th
Introduction
Hopefully you have read my previous blog posts on pvSCSI. It describes what the driver is, how it works, and how it can positively impact your performance and workloads. Part two covers the process of installing the pvSCSI driver on an existing system and a new system. Both can be found here on the site and you might find them useful:
http://www.virtualinsanity.com/index.php/2009/11/21/more-bang-for-your-buck-with-pvscsi-part-1/
http://www.virtualinsanity.com/index.php/2009/12/01/more-bang-for-your-buck-with-pvscsi-part-2/
Interrupt Coalescing
VMware recently published a KB article that answers a question that has been floating around the community for a while. The pvSCSI driver sounds superior to the LSI driver with direct I/O access to the hypervisor so why not use it in all cases? The article states that you should only use the newer driver when driving higher workloads, those that are typically 2000 IOPS or greater. For those that don’t know 2000 IOPS is a pretty big workload. Consider this, a standard fiber channel 10,000 RPM drive averages around 125 IOPS per disk.
I didn’t really understand this and the knowledge base article is lacking any detail on the rational behind the statement. I reached out to VMware performance engineer Scott Drummonds to see if he had anything he could publish to help clarify the KB article. Scott was nice enough to research this and posted his findings here.
So it appears that the technical explanation is interrupt coalescing or buffering. The paravirtual SCSI driver was designed to handle receiving multiple requests at a high rate and then “batching” the requests together for better efficiencies in throughput. If you aren’t generating high enough workloads on the virtual machine, the I/O request could unnecessarily sit in the queue while the “batch" waits to be filled up for the next transaction. This could cause storage performance problems which would typically be seen as higher latency and would negatively impact the virtual machine.
Now and Then
The great news is the current release of the driver is optimized for heavy workloads. If you are starting to virtualize SQL/Oracle systems and need the performance, go for the pvSCSI driver and get better throughput. If your deploying standard virtual machines that are doing lower workloads, continue to embrace the existing LSI Logic driver.
If you are new to vSphere 4, or have just upgraded from 3.5 and are starting to rebuild your templates to embrace virtual hardware version 7, don’t use the pvSCSI driver as part of your standard template. VMware is working on the driver and will be introducing advanced coalescing functionality. When this is built into the driver stack pvSCSI will then be able to be utilized for all workloads as it will understand when it needs to ramp up for higher workloads.
Thanks again to Scott Drummonds for taking the time out of his busy schedule to track this one down.
VMware Windows Perfmon counters missing in vSphere 4u1
Jan 31st
I am attempting to pull together a blog post around performance, it’s going to be a four part segment on each I/O component of VMware, CPU, Memory, Storage and Networking. My goal is to try and cover the various tools that you can use to help troubleshoot performance problems that you might experience in your virtual environment.
While I was going through some of the methods, I wanted to illustrate how VMware now includes Windows Performance Counters inside a guest virtual machine to assist with performance monitoring/troubleshooting. I jumped on a test virtual machine I have, and pulled up Windows perfmon. To my dismay the VMware counters are missing! We are currently running VMware vSphere 4.0 update 1 so I checked with a few other people online like Rick Vanover (@RickVanover). It confirmed it seemed to be related to this specific release of vSphere.
I reached out to Scott Drummonds via Twitter (@drummonds), a performance systems engineer who works for VMware, and also opened a service request with support. Scott validated that he saw the same issue and was launching an investigation. Unfortunately the SR didn’t get very far as I was instructed that this was an “experimental feature and was removed from vSphere”. Uhhh ok, I knew that wasn’t right so I waited to hear back from Scott.
Scott has since written a blog post that discusses this issue. It looks like a complete uninstall of the VMware tools on the client followed by a re-install resolves the issue. This does require a reboot for those that are not familiar with this process. The problem appears to be related to mofcomp which it a tool that Microsoft provides and registers WMI information (such as VMware performance counters) with Windows.
Thanks to Scott for jumping on this so quickly and posting a fix to the issue, it’s great to see social media paying off in the real world. Thanks to Rick for helping me figure out what was going on and validating some of my assumptions. Rick has also written up an excellent blog post on this same issue. Hopefully a patch will be rolled into the next minor release of vSphere 4 that will resolve this bug going forward.
Bring on the 10Gig Ethernet!
Nov 17th
VMware recently updated its networking performance tests to see if the ESX hypervisor could efficiently leverage the ever-expanding bandwidth available at the Ethernet level. In short, it sure can! A single VM can effectively saturate a 10Gbps link when jumbo frames are enabled. But that’s not to say it can’t perform well with multiple virtual machines. Things scaled nicely and equitably for all VM’s. This type of scalable performance is reassuring as customers continue to raise consolidation ratios within their datacenters and virtualize the largest of workloads.
To save you some reading, here is the summary from the whitepaper, which can be found at: http://www.vmware.com/pdf/10GigE_performance.pdf
Conclusion:The results presented in the previous sections show that virtual machines running on ESX 3.5 Update 1 can efficiently share and saturate 10Gbps Ethernet links. A single uniprocessor virtual machine can push as much as 8Gbps of traffic with frames that use the standard MTU size and can saturate a 10Gbps link when using jumbo frames. Jumbo frames can also boost receive throughput by up to 40 percent, allowing a single virtual machine to receive traffic at rates up to 5.7Gbps.
Our detailed scaling tests show that ESX scales very well with increasing load on the system and fairly allocates bandwidth to all the booted virtual machines. Two virtual machines can easily saturate a 10Gbps link (the practical limit is 9.3Gbps for packets that use the standard MTU size because of protocol overheads), and the throughput remains constant as we add more virtual machines. Scaling on the receive path is similar, with throughput increasing linearly until we achieve line rate and then gracefully decreasing as system load and resource contention increase.
Thus, ESX 3.5 Update 1 supports the latest generation of 10Gbps NICs with minimal overheads and allows high virtual machine consolidation ratios while being fair to all virtual machines sharing the NICs and maintaining 10Gbps line rates.
Virtualizing Tier 1 Applications
Aug 31st
With virtualization finding its way into every nook and cranny of the data center, it would seem that tier 1 applications are the only safe harbor for the few remaining “Server Huggers” out there. Their mantra usually sounds something like this …
“My application is too I/O intensive for virtualization,” or “MY xyz application vendor doesn’t support VMware” or possibly “My application is too important to be virtualized” (this is one of my favorites). Believe it or not, I even heard one guy say “you can virtualize my server when you pry it from my cold dead hands” … um, wow. He has issues. Last I heard, he was de-virtualizing a server farm at the NRA. Hehehe.
Anyway, for the rest of us with our heads NOT buried in the sand, I’m here to tell you that tier 1 applications can and should be virtualized. I’ll go so far to say that if you’re not virtualizing tier 1 applications, you are doing your company a major disservice.
Below is a brief overview of a presentation I gave in Cincinnati a few weeks ago to a group of about 75 professionals. The topic was “Virtualizing Microsoft Exchange.” And while the content that follows is geared towards the Microsoft Exchange application, it can really apply to any tier 1 application.
Performance
I’ll start with performance because this is typically the first objection to virtualizing a Tier 1 app. The perception is that virtualization creates too much overhead and therefore applications in a VM will certainly underperform applications running on a physical server. This current perception was born out of a previous reality. In the early days, virtualization really did introduce enough overhead to warrant physical servers for applications with high I/O. But a perfect storm is a-brewin’ and I summarize it with the following equation:
hypervisor improvements + server hardware improvements + application improvements =
better than native performance
That’s right. Mileage will vary, but given a properly architected solution, virtual can actually outperform physical. And even in scenarios where physical outperforms virtual, the delta is probably measurable, but not observable. So let’s take a closer look at the three areas I mentioned in the equation above.
Hypervisor Improvements
The hypervisor (AKA, the virtualization layer, AKA the Server Hugger’s worst nightmare) has come a long way in the past few years. And in VMware’s ESX product, the latest version has the following performance improvements over previous versions:
- Increased guest OS memory to 64GB
- Increased physical RAM on ESX to 256GB
- TCP segment offload to further lower CPU utilization
- NUMA optimizations improve multiple VM performance
- Support for 64-bit clustering with boot from SAN
These improvements alone can capture almost all tier 1 applications, but combined with the next two, almost no tier 1 app can hide from becoming a candidate for virtualization.
Server Hardware Improvements
We’re now seeing server hardware with 256GB+ of physical RAM. Multi-core CPU’s with 2 and 4 cores are running in production today and 6/8/12 cores are coming soon. And best of all, hardware-assisted virtualization technologies are emerging, pushing the virtualization overhead down to the hardware, getting the hypervisor ever closer to near native performance.
And because the vast majority applications simply can’t fully utilize hardware with this much horsepower, ironically, virtualization is the only way to truly capture the full ROI of these physical investments.
Application Improvements
As applications continue to evolve, bugs are fixed and bad code is optimized, performance improvements within the application are being realized, further reducing the need for a physical server. Speaking specifically about Microsoft Exchange, the following performance improvements exist in 2007 over 2003:
|
Exchange 2003 |
Exchange 2007 |
| 32-bit Windows | 64-bit Windows |
| 900MB database cache | Multi-GB database cache |
| 4Kb block size | 8Kb block size |
| High read/write ratio | 1:1 read/write ratio |
| Requires high-end storage | Affordable storage (iSCSI) |
| Storage is common pain point | Eliminates storage pain point |
| 50% reduction in disk I/O |
Of course the improvements for this piece of the equation will vary from one app to the next.
Bottom Line: Performance should not be a barrier to virtualizing an application.
A Virtual Server is Better than a Physical Server
Tier 1 applications are the most critical, important applications in your organization and therefore they need to run on the best infrastructure possible. So almost by definition, tier 1 applications need run in a VM. Here are a few of my favorite reasons why a VM is better than a physical server. Keep in mind, these aren’t the only reasons, just my favorites.
Reason #1: Better up time
The “eggs in one basket” argument no longer applies. And for those of you who don’t know what I’m talking about, the objection usually sounds something like this … “If I put 30 VMs on a single physical server, and that physical server crashes, then I’ve just lost 30 applications instead of one!” This was a very legitmate concern five years ago. But today you can get better uptime in a VM than you can with a physical machine. In the worst case scenario, if a physical server dies, those VMs are automatically powered up on a different physical server. In my experience, the VMs are usually back up and taking requests in under two minutes (and yes, I’ve timed it with a stop watch). And this is worst case scenario for a VM today! What’s best case scenario for restoring a physical server after a hardware crash? Weeks? Days? Hours (if you’re lucky and really prepared)?
So with today’s technology (and it’s only going to get better with what’s coming soon), worst case scenario for a VM is better than best case scenario for a physical server. And you might ask, what’s best case scenario? Even with hardware maintence, you can achieve 100% uptime with VMs. How? Check out a few of VMware’s features like VMotion, DRS and Update Manager.
Reason #2: Better hardware utilization
The average server utilization across the globe is less than 10% and in my experience, it’s often less than 5%. Why? A single application can rarely harness the power of the hardware it’s running on. And for a ton of different reasons (which I won’t go in to here), critical applications typically require a dedicated server. That is like buying a Ferrari and never driving it more than 5 mph … what an awful waste! Get the most for your money by putting each app in a VM, running multiple VMs per physical server. Open that baby up and let it do what it was built to do! I think the following two screen shots do a great job of showing you what I’m talking about.
CPU Utilization Before VMware
CPU Utilization After VMware
Reason #4: Avoid over provisioning
Why waste time and energy planning for future capacity (which is really nothing more than an educated guess based upon a ton of assumptions)? The tendency has been to over provision hardware to account for future growth, but this often leads to under utilized hardware. With Virtual Machines, additional CPU and RAM can be added at anytime with a few clicks of a mouse. And moving to more powerful systems in the future can be done in real time with VMotion and/or Storage VMotion. With virutalization, it only makes sense to simply build your application for the capacity you need and then throttle as necessary.
Reason #5: Better Security
Typically, protection engines come in two forms, host based and network based. The problem with network based security software is that it has no (or very limited) visibility in to the host. And the problem with host based security software is that it’s running in the same context as the malware that it’s trying to protect against. And the creators of malware are not stupid! They continually find new ways to hide their malware and/or attack the protection engine, creating a never ending viscious circle of cat-and-mouse.
But we now have new, trusted layer with the much smaller codebase of the hypervisor where we can provide protection from outside of the operating system. A protection engine from this layer provides a much stronger defense because it’s “underneath” the VM, completely isolated from the malware. And this is a great place for a protection engine to live because it can see all I/O of the VM and inspect each of the virtual components (CPU, Memory, Network and Storage). Better yet, we now have the ability to do things like:
- Intercept, view, modify and replicate I/O traffic from one, many or all VMs
- Provide inline protection or passive monitoring
- Mount and read virtual disks
Reason #6: DR made easy
In the physical world, DR is a pain in the butt and super expensive. The reason is DR solutions for physical servers often require similar hardware at the DR site to avoid issues with driver, hardware, and software compatibility. These dependencies are eliminated in a virtual world, which means any VM can run on any physical server with an ESX hypervisor. And because a VM is completely encapsulated, the entire VM exists in a small set of files. This simplifies replication and therefore simplifies the process of keeping your production and your DR environment in sync. And finally, servers at the DR site can be used for other purposes, like test and development, until they are required for DR purposes. Which means an investment in a DR infrastructure will not site idle.
Support
I love it when I hear someone say “my application vendor says they won’t support VMware.” Hmmmmm. Here’s a crazy question for ya, isn’t it VMware’s job to support VMware? Now, I’m sure what they really mean is that the vendor won’t support their application in a virtualized environment. But just to make things clear, if you have a problem with VMware … call VMware.
And support for applications in a virtualized environment is rapidly changing. Examples are numerous, but two big ones that come to mind are SAP and Microsoft. In the earlier part of the year, SAP announced full support for their software on VMware. And just recently, Microsoft announced the Server Virtualization Validation Program (SVVP) where they will support their OS’s and a good list of their applications in a virtualized environment. And VMware’s ESX is the industry’s first hypervisor to be validated by Microsoft.
What about those vendors who still don’t support their applications in a virtualized environment? Most of my customers do two things. First, they put pressure on the vendor to start providing support. For large companies, this can be very effective since the software providers want to keep their big customers happy. Second, many of them have a “swing server.” So when a vendor’s support team requires them to reproduce the problem on physical hardware, they simply V2P the VM on the swing server and continue on their merry way. (Yes, I know, this isn’t always as easy as I make it sound. Though it often can be just that easy)
Still not convinced?
The table above is the results of a survey of 500 VMware customers taken over a year ago, and the numbers are growing rapidly. Simply put, customers are virtualizing tier 1 applications today.
Powered by ScribeFire.



