Posts tagged Performance

Performance Troubleshooting VMware vSphere – Memory

memory

Introduction

As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands.  Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems.  From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory.  VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.

With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory.  This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration.  I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw.  The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed.  A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.

Memory Technologies in VMware vSphere

There are some major benefits of virtualization when it comes to memory.  VMware implements some sophisticated and unique ways of maximizing physical memory workloads within an ESX host.  All of these features work out of the box with no advanced configuration necessary.  To understand problems that might occur in your environment you need to be familiar with these basic memory concepts.

  • Transparent Page Sharing – The VMkernel will compare physical memory pages to find duplicates, then free up this redundant space and replaces it with a pointer.  If multiple operating systems are running on one physical host, why should you load the same files multiple times?  Think of this as the data de-duplication process we are seeing in a majority of backup solutions in the industry.
  • Memory Overcommitment – The act of assigning more memory to powered on virtual machines than the physical server has available.  This allows for virtual machines that have heavier memory demands to utilize the memory that is not actively being used on under utilized machines.
  • Memory Overhead - Once a virtual machine is powered on the ESX host reserves memory for the the normal operations of VMware infrastructure.  This memory can’t be used for swapping or ballooning, and is reserved for the system.
  • Memory Balloon Driver – When VMware tools are installed on a virtual machine they provide device drivers into the host virtualization layer, from within the guest operating system.  Part of this package that is installed is the balloon driver or “vmmemctl” which can be observed inside the guest.  The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it’s no longer valuable to the operating system.  If the Physical ESX server begins to run low on memory it will grow the balloon driver to reclaim memory from the guest.  This process reduces the chance that the physical ESX host will begin to swap, which you will cause performance degradation.  Here is an illustration if ballooning in ESX:

image

What to look for
  • Check ESX host swapping.  If you are overcommitting memory on the physical ESX host you can run into a situation when each virtual machine is in need of the total amount of what is granted.  When the host is out of memory it will begin to page out.  Keep an eye on your oversubscription rates of physical hosts, or ensure you have enough memory resources across your DRS clusters so it can balance the load more effectively.  Swapping will occur when the following formula is met:

Total_active_memory > (Memory_Capacity – Memory_Overhead) + Total_balloonable_memory + Page_sharing_savings

  • Check for Virtual machine swapping.  Make sure you virtual machines have enough memory for the application workload that they are supporting.  If virtual machine swapping starts to occur this can put a strain on the disk subsystem.
  • Check to ensure VMware tools are installed and updated.  VMware tools not only provides drivers from the guest to the hypervisor, but the balloon driver also gets installed with VMware tools.  For proper memory management the ESX host relies on the balloon driver to manage memory.
  • Check memory reservation settings.  By default VMware ESX dynamically tries to reclaim memory when not needed.  There are situations when you might choose to utilize memory reservations.  If you set memory reservations in your environment be aware that this memory is permanently assigned to the host and can not be reallocated when it’s not being used.  Don’t sell the balloon driver short, many third part application vendors over spec their configurations for personal safety, and ballooning can help counteract some of that wasted “fluff factor”.
Monitoring with Virtual Center

The first place I would start with checking memory configurations is Virtual Center.  Virtual Center provides excellent reporting and gives you granular control over which metrics you would like to report against.  VMware vSphere now includes a nice graphical summary in the performance tab of the physical host.  This gives you a quick dashboard type view of the overall health of the system over a 24 hour period.  Here are some memory samples:

Check your over all % usage (lower is better)

image Check your Ballooning (lower is better)

image

Selecting the advance tab gives you a much more granular way of viewing performance data.  At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information.  Here is a snapshot of memory utilization with many of the variables we just discussed above, great snapshot of what’s going on (looks healthy below):

Check your various metrics, mainly for swapping activity

image The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host.  Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked and forgotten about.  While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning.  Setup Memory alerts so you will be notified via e-mail if a problem starts to manifest itself.  Here is an alarm configured to trigger if physical host Memory usage is above 90% for 5 minutes or greater.  A lot of these alerts are built into Virtual Center so you don’t have to do a lot of pre-configuration work.  You do need to make sure you setup the e-mail notifications under the “Actions Tab”.

image

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host.  Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing.  SSH to one of your ESX servers and execute the command “esxtop”.  The default screen that you should see is the CPU screen, if you need to monitor memory select the “m” key.  Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”.  Check your total Physical memory here, make sure you aren’t over committing and causing swapping.  Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

image

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool.  If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up.  You can monitor the same metrics found in Virtual Center and esxtop here.  Just another way of getting at the data especially if you have a background in Microsoft Windows and are familiar with perfmon.

image

Monitoring with PowerCLI

Another great place to go to for finding potential memory problems and bottlenecks is PowerCLI.  I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf.  If your not a command line guru don’t let this discourage you.  PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts.  Want to find out what your ESX host service console memory is set to?  How about virtual machines that have memory reservations, shares or limits configured?  You can pull all of this information using Alan’s powerpack.

image

Conclusion

If your using VMware vSphere, there are many different ways to monitor for memory problems.  The Virtual Center database is the first place you should start.  Check your physical host memory conditions, then work your way down the stack to the virtual machine(s) that might be indicating a problem.  Take a look at esxtop, check some of the key metrics that we discussed above.

Look for the outliers in your environment.  If something doesn’t look right, that’s probably the case.  Scratch away at the surface and see if something pops up.  Use all possible tools available to you like PowerCLI.  Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of.  If all else fails, engage VMware support and open a service request.  Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Performance Troubleshooting VMware vSphere – The Tetralogy

steth

Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point.  I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools.  I will admit, this was a challenge for me to write up.  There is so much material and information that I had to really focus on keeping it simple and to the point.  Performance problems can span such a wide array of possibilities that there is never typically one easy answer.  Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.

There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic.  Since this is such a broad topic, I wanted to try and set the stage.  The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation.   This page will be the launch point for the various categories.  Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.

Performance Troubleshooting VMware vSphere – CPU

Performance Troubleshooting VMware vSphere – Memory

Performance Troubleshooting VMware vSphere – Storage

Performance Troubleshooting VMware vSphere – Network

There is one last I/O component that I will not be covering, and that is the human factor.  These posts will assume that your installation or upgrade is of sound mind and body.  If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.

Acknowledgments/References:

VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg

Performance Monitoring and Analysis – Scott Drummonds

VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja

http://www.vmware.com/support/

http://www.yellow-bricks.com/esxtop/ – Duncan Epping

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Performance Troubleshooting VMware vSphere – CPU

pentiumee_processor_back

Introduction

Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures.  Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.

What to look for
  • Check for physical cpu utilization that is consistently above 80-90%.  Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server.  Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
  • Watch pCPU0 on non ESXi hosts.  If pCPU0 is consistently saturated, this will negatively impact performance of the overall system.  If you are using third party agents, ensure they are functioning properly.  A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS.  All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
  • Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage).  This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
  • Watch for virtual machines that are consistently at 80-100% utilization.  This is not a typical pattern of a conventional server.  Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles.  I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines!  If something doesn’t look right, it’s worth checking it out.
  • Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center.  I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor.  After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP.  If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center

Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level.  Before getting into too much detail I wanted to explain Virtual Center statistics logging.  There are various levels of logging that can be set for the VC database.  Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting.  Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed).  If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here.  To change these settings click, Administration –> vCenter Server Settings –> Statistics.

image

Let’s take a look at a physical ESX host performance metrics through Virtual Center.  vSphere now includes a nice graphical summary in the performance tab of the physical host.  This gives you a quick dashboard type view of the overall health of the system over a 24 hour period.  Here is the CPU sample:

image Selecting the advance tab gives you a much more granular way of viewing performance data.  At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information.  Here is a snapshot of physical CPU utilization across all processors:

image

The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host.  Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked.  While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning.  Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself.  Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.

image

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host.  Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing.  SSH to one of your ESX servers and execute the command “esxtop”.  The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard.  Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”.  Check your PCPU and CCPU (Physical/Console) here.  Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

imageA detailed list of ESXTOP counters can be found here:

http://communities.vmware.com/docs/DOC-5240

http://communities.vmware.com/docs/DOC-9279

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool.  If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up.  Check your % Processor time which is the current load of the virtual processor.

image

Monitoring with PowerCLI

Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI.  I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf.  If your not a command line guru don’t let this discourage you.  PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts.  Want to find virtual machines with CPU ready time?  How about virtual machines that have CPU reservations, shares or limits configured?  You can pull all of this information using Alan’s powerpack.

image

Conclusion

If your using VMware vSphere, there are many different ways to monitor for CPU problems.  The Virtual Center database is the first place you should start.  Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem.  Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.

Look for the outliers in your environment.  If something doesn’t look right, that’s probably the case.  Scratch away at the surface and see if something pops up.  Use all possible tools available to you like PowerCLI.  Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of.  If all else fails, engage VMware support and open a service request.  Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VMware pvSCSI – When and when not to use it

scsi_cable

Introduction

Hopefully you have read my previous blog posts on pvSCSI.  It describes what the driver is, how it works, and how it can positively impact your performance and workloads.  Part two covers the process of installing the pvSCSI driver on an existing system and a new system.  Both can be found here on the site and you might find them useful:

http://www.virtualinsanity.com/index.php/2009/11/21/more-bang-for-your-buck-with-pvscsi-part-1/

http://www.virtualinsanity.com/index.php/2009/12/01/more-bang-for-your-buck-with-pvscsi-part-2/

Interrupt Coalescing

VMware recently published a KB article that answers a question that has been floating around the community for a while.  The pvSCSI driver sounds superior to the LSI driver with direct I/O access to the hypervisor so why not use it in all cases?  The article states that you should only use the newer driver when driving higher workloads, those that are typically 2000 IOPS or greater.  For those that don’t know 2000 IOPS is a pretty big workload.  Consider this, a standard fiber channel 10,000 RPM drive averages around 125 IOPS per disk.

I didn’t really understand this and the knowledge base article is lacking any detail on the rational behind the statement.  I reached out to VMware performance engineer Scott Drummonds to see if he had anything he could publish to help clarify the KB article.  Scott was nice enough to research this and posted his findings here.

So it appears that the technical explanation is interrupt coalescing or buffering.  The paravirtual SCSI driver was designed to handle receiving multiple requests at a high rate and then “batching” the requests together for better efficiencies in throughput.  If you aren’t generating high enough workloads on the virtual machine, the I/O request could unnecessarily sit in the queue while the “batch" waits to be filled up for the next transaction.  This could cause storage performance problems which would typically be seen as higher latency and would negatively impact the virtual machine.

Now and Then

The great news is the current release of the driver is optimized for heavy workloads.  If you are starting to virtualize SQL/Oracle systems and need the performance, go for the pvSCSI driver and get better throughput.  If your deploying standard virtual machines that are doing lower workloads, continue to embrace the existing LSI Logic driver.

If you are new to vSphere 4, or have just upgraded from 3.5 and are starting to rebuild your templates to embrace virtual hardware version 7, don’t use the pvSCSI driver as part of your standard template.  VMware is working on the driver and will be introducing advanced coalescing functionality.  When this is built into the driver stack pvSCSI will then be able to be utilized for all workloads as it will understand when it needs to ramp up for higher workloads.

Thanks again to Scott Drummonds for taking the time out of his busy schedule to track this one down.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VMware Windows Perfmon counters missing in vSphere 4u1

I am attempting to pull together a blog post around performance, it’s going to be a four part segment on each I/O component of VMware, CPU, Memory, Storage and Networking.  My goal is to try and cover the various tools that you can use to help troubleshoot performance problems that you might experience in your virtual environment.

image

While I was going through some of the methods, I wanted to illustrate how VMware now includes Windows Performance Counters inside a guest virtual machine to assist with performance monitoring/troubleshooting.  I jumped on a test virtual machine I have, and pulled up Windows perfmon.  To my dismay the VMware counters are missing! We are currently running VMware vSphere 4.0 update 1 so I checked with a few other people online like Rick Vanover (@RickVanover).  It confirmed it seemed to be related to this specific release of vSphere.

I reached out to Scott Drummonds via Twitter (@drummonds), a performance systems engineer who works for VMware, and also opened a service request with support.  Scott validated that he saw the same issue and was launching an investigation.  Unfortunately the SR didn’t get very far as I was instructed that this was an “experimental feature and was removed from vSphere”.  Uhhh ok, I knew that wasn’t right so I waited to hear back from Scott.

Scott has since written a blog post that discusses this issue.  It looks like a complete uninstall of the VMware tools on the client followed by a re-install resolves the issue.  This does require a reboot for those that are not familiar with this process.  The problem appears to be related to mofcomp which it a tool that Microsoft provides and registers WMI information (such as VMware performance counters) with Windows.

Thanks to Scott for jumping on this so quickly and posting a fix to the issue, it’s great to see social media paying off in the real world.  Thanks to Rick for helping me figure out what was going on and validating some of my assumptions.  Rick has also written up an excellent blog post on this same issue.  Hopefully a patch will be rolled into the next minor release of vSphere 4 that will resolve this bug going forward.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Bring on the 10Gig Ethernet!

VMware recently updated its networking performance tests to see if the ESX hypervisor could efficiently leverage the ever-expanding bandwidth available at the Ethernet level. In short, it sure can! A single VM can effectively saturate a 10Gbps link when jumbo frames are enabled. But that’s not to say it can’t perform well with multiple virtual machines. Things scaled nicely and equitably for all VM’s. This type of scalable performance is reassuring as customers continue to raise consolidation ratios within their datacenters and virtualize the largest of workloads.

To save you some reading, here is the summary from the whitepaper, which can be found at: http://www.vmware.com/pdf/10GigE_performance.pdf

Conclusion:The results presented in the previous sections show that virtual machines running on ESX 3.5 Update 1 can efficiently share and saturate 10Gbps Ethernet links. A single uniprocessor virtual machine can push as much as 8Gbps of traffic with frames that use the standard MTU size and can saturate a 10Gbps link when using jumbo frames. Jumbo frames can also boost receive throughput by up to 40 percent, allowing a single virtual machine to receive traffic at rates up to 5.7Gbps.

Our detailed scaling tests show that ESX scales very well with increasing load on the system and fairly allocates bandwidth to all the booted virtual machines. Two virtual machines can easily saturate a 10Gbps link (the practical limit is 9.3Gbps for packets that use the standard MTU size because of protocol overheads), and the throughput remains constant as we add more virtual machines. Scaling on the receive path is similar, with throughput increasing linearly until we achieve line rate and then gracefully decreasing as system load and resource contention increase.

Thus, ESX 3.5 Update 1 supports the latest generation of 10Gbps NICs with minimal overheads and allows high virtual machine consolidation ratios while being fair to all virtual machines sharing the NICs and maintaining 10Gbps line rates.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Virtualizing Tier 1 Applications

With virtualization finding its way into every nook and cranny of the data center, it would seem that tier 1 applications are the only safe harbor for the few remaining “Server Huggers” out there.  Their mantra usually sounds something like this …

“My application is too I/O intensive for virtualization,” or “MY xyz application vendor doesn’t support VMware” or possibly “My application is too important to be virtualized” (this is one of my favorites).  Believe it or not, I even heard one guy say “you can virtualize my server when you pry it from my cold dead hands” … um, wow.  He has issues.  Last I heard, he was de-virtualizing a server farm at the NRA.  Hehehe.

Anyway, for the rest of us with our heads NOT buried in the sand, I’m here to tell you that tier 1 applications can and should be virtualized.  I’ll go so far to say that if you’re not virtualizing tier 1 applications, you are doing your company a major disservice.

Below is a brief overview of a presentation I gave in Cincinnati a few weeks ago to a group of about 75 professionals.  The topic was “Virtualizing Microsoft Exchange.” And while the content that follows is geared towards the Microsoft Exchange application, it can really apply to any tier 1 application.

Performance

I’ll start with performance because this is typically the first objection to virtualizing a Tier 1 app.  The perception is that virtualization creates too much overhead and therefore applications in a VM will certainly underperform applications running on a physical server.  This current perception was born out of a previous reality.  In the early days, virtualization really did introduce enough overhead to warrant physical servers for applications with high I/O. But a perfect storm is a-brewin’ and I summarize it with the following equation:

hypervisor improvements + server hardware improvements + application improvements =
better than native performance

That’s right.  Mileage will vary, but given a properly architected solution, virtual can actually outperform physical. And even in scenarios where physical outperforms virtual, the delta is probably measurable, but not observable.  So let’s take a closer look at the three areas I mentioned in the equation above.

Hypervisor Improvements

The hypervisor (AKA, the virtualization layer, AKA the Server Hugger’s worst nightmare) has come a long way in the past few years.  And in VMware’s ESX product, the latest version has the following performance improvements over previous versions:

  • Increased guest OS memory to 64GB
  • Increased physical RAM on ESX to 256GB
  • TCP segment offload to further lower CPU utilization
  • NUMA optimizations improve multiple VM performance
  • Support for 64-bit clustering with boot from SAN

These improvements alone can capture almost all tier 1 applications, but combined with the next two, almost no tier 1 app can hide from becoming a candidate for virtualization.



Server Hardware Improvements

We’re now seeing server hardware with 256GB+ of physical RAM. Multi-core CPU’s with 2 and 4 cores are running in production today and 6/8/12 cores are coming soon. And best of all, hardware-assisted virtualization technologies are emerging, pushing the virtualization overhead down to the hardware, getting the hypervisor ever closer to near native performance.

And because the vast majority applications simply can’t fully utilize hardware with this much horsepower, ironically, virtualization is the only way to truly capture the full ROI of these physical investments.



Application Improvements

As applications continue to evolve, bugs are fixed and bad code is optimized, performance improvements within the application are being realized, further reducing the need for a physical server. Speaking specifically about Microsoft Exchange, the following performance improvements exist in 2007 over 2003:

Exchange 2003

Exchange 2007

32-bit Windows 64-bit Windows
900MB database cache Multi-GB database cache
4Kb block size 8Kb block size
High read/write ratio 1:1 read/write ratio
Requires high-end storage Affordable storage (iSCSI)
Storage is common pain point Eliminates storage pain point
50% reduction in disk I/O

Of course the improvements for this piece of the equation will vary from one app to the next.



Bottom Line: Performance should not be a barrier to virtualizing an application.


A Virtual Server is Better than a Physical Server

Tier 1 applications are the most critical, important applications in your organization and therefore they need to run on the best infrastructure possible.  So almost by definition, tier 1 applications need run in a VM.  Here are a few of my favorite reasons why a VM is better than a physical server.  Keep in mind, these aren’t the only reasons, just my favorites.

Reason #1: Better up time

The “eggs in one basket” argument no longer applies.  And for those of you who don’t know what I’m talking about, the objection usually sounds something like this … “If I put 30 VMs on a single physical server, and that physical server crashes, then I’ve just lost 30 applications instead of one!”  This was a very legitmate concern five years ago.  But today you can get better uptime in a VM than you can with a physical machine.  In the worst case scenario, if a physical server dies, those VMs are automatically powered up on a different physical server.  In my experience, the VMs are usually back up and taking requests in under two minutes (and yes, I’ve timed it with a stop watch).  And this is worst case scenario for a VM today!  What’s best case scenario for restoring a physical server after a hardware crash?  Weeks?  Days?  Hours (if you’re lucky and really prepared)?

So with today’s technology (and it’s only going to get better with what’s coming soon), worst case scenario for a VM is better than best case scenario for a physical server.  And you might ask, what’s best case scenario?  Even with hardware maintence, you can achieve 100% uptime with VMs.  How?  Check out a few of VMware’s features like VMotion, DRS and Update Manager.


Reason #2: Better hardware utilization

The average server utilization across the globe is less than 10% and in my experience, it’s often less than 5%.  Why?  A single application can rarely harness the power of the hardware it’s running on.  And for a ton of different reasons (which I won’t go in to here), critical applications typically require a dedicated server.  That is like buying a Ferrari and never driving it more than 5 mph … what an awful waste!  Get the most for your money by putting each app in a VM, running multiple VMs per physical server.  Open that baby up and let it do what it was built to do!  I think the following two screen shots do a great job of showing you what I’m talking about.

CPU

CPU Utilization Before VMware

CPU of a Physical Server after VMware

CPU Utilization After VMware



Reason #4: Avoid over provisioning

Why waste time and energy planning for future capacity (which is really nothing more than an educated guess based upon a ton of assumptions)?  The tendency has been to over provision hardware to account for future growth, but this often leads to under utilized hardware.  With Virtual Machines, additional CPU and RAM can be added at anytime with a few clicks of a mouse.  And moving to more powerful systems in the future can be done in real time with VMotion and/or Storage VMotion.  With virutalization, it only makes sense to simply build your application for the capacity you need and then throttle as necessary.



Reason #5:  Better Security

Typically, protection engines come in two forms, host based and network based.  The problem with network based security software is that it has no (or very limited) visibility in to the host.  And the problem with host based security software is that it’s running in the same context as the malware that it’s trying to protect against.  And the creators of malware are not stupid! They continually find new ways to hide their malware and/or attack the protection engine, creating a never ending viscious circle of cat-and-mouse.

But we now have new, trusted layer with the much smaller codebase of the hypervisor where we can provide protection from outside of the operating system.  A protection engine from this layer provides a much stronger defense because it’s “underneath” the VM, completely isolated from the malware.  And this is a great place for a protection engine to live because it can see all I/O of the VM and inspect each of the virtual components (CPU, Memory, Network and Storage).  Better yet, we now have the ability to do things like:

  • Intercept, view, modify and replicate I/O traffic from one, many or all VMs
  • Provide inline protection or passive monitoring
  • Mount and read virtual disks

Securing a Virtual Machine



Reason #6: DR made easy

In the physical world, DR is a pain in the butt and super expensive.  The reason is DR solutions for physical servers often require similar hardware at the DR site to avoid issues with driver, hardware, and software compatibility.  These dependencies are eliminated in a virtual world, which means any VM can run on any physical server with an ESX hypervisor.  And because a VM is completely encapsulated, the entire VM exists in a small set of files.  This simplifies replication and therefore simplifies the process of keeping your production and your DR environment in  sync.  And finally, servers at the DR site can be used for other purposes, like test and development, until they are required for DR purposes.  Which means an investment in a DR infrastructure will not site idle.


Support

I love it when I hear someone say “my application vendor says they won’t support VMware.” Hmmmmm.  Here’s a crazy question for ya, isn’t it VMware’s job to support VMware?  Now, I’m sure what they really mean is that the vendor won’t support their application in a virtualized environment.  But just to make things clear, if you have a problem with VMware … call VMware.

And support for applications in a virtualized environment is rapidly changing.  Examples are numerous, but two big ones that come to mind are SAP and Microsoft.  In the earlier part of the year, SAP announced full support for their software on VMware.  And just recently, Microsoft announced the Server Virtualization Validation Program (SVVP) where they will support their OS’s and a good list of their applications in a virtualized environment. And VMware’s ESX is the industry’s first hypervisor to be validated by Microsoft.

What about those vendors who still don’t support their applications in a virtualized environment?  Most of my customers do two things.  First, they put pressure on the vendor to start providing support.  For large companies, this can be very effective since the software providers want to keep their big customers happy.  Second, many of them have a “swing server.”  So when a vendor’s support team requires them to reproduce the problem on physical hardware, they simply V2P the VM on the swing server and continue on their merry way.  (Yes, I know, this isn’t always as easy as I make it sound.  Though it often can be just that easy)


Still not convinced?

The table above is the results of a survey of 500 VMware customers taken over a year ago, and the numbers are growing rapidly.   Simply put, customers are virtualizing tier 1 applications today.



Powered by ScribeFire.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon