Posts tagged vSphere

Upgrade your Virtual Hardware in a few minutes, with a twist.

stop_watch

Introduction

I attended last months Cincinnati VMUG (VMware User Group) and was surprised to hear the responses from the audience on how many customers had not taken the plunge, and upgraded to vSphere yet.  I think there were a handful of users that had just completed the upgrade.  Sometimes I forget to step out of my own personal space and consider what others have going on in their own environments.  If your still wondering about the upgrade, Aaron has a post on some of the benefits of going from VI3 to vSphere.

Part of the process of upgrading your existing investment is the need to upgrade all of the virtual machines to the latest and greatest virtual machine hardware version 7.  Someone mentioned to me how much of a pain this was since you had to touch each virtual machine, and my response to them was “It only takes a couple of minutes”.  I wanted to prove this theory in a different way, so I mulled over it and came up with a timed video clip.  The song I chose is 2 minutes and 39 seconds, so I figured If I can knock this out within the amount of time it takes for the song to play, well then, mission accomplished.

vSphere Upgrade Thoughts

Before getting into my bizarre video clip challenge, some quick thoughts and comments from my personal experiences on the upgrade are as follows.

  • Make sure that you check the new HCL for vSphere prior to the upgrade, some of your older server hardware might not be technically compatible or supported with the new release of code.
  • Understand the licensing changes that have taken place before you begin your upgrade process.  Work with your account team or VAR and understand the features and functionality that fit your environment.  You need to ensure your current licenses get ported over so the newer licensing server will be able to register your newer ESX hosts.
  • If your going to slowly transition over to vSphere you will need to maintain a legacy license server for the older VI3 hosts until your migration is complete.
  • Testing your upgrade is a lab environment is always a good approach if you have the infrastructure.
  • If you are utilizing hardware management agents on your ESX hosts or third party backup software, make sure you get the latest agents that support the current release of vSphere.
  • If you are upgrading your existing Virtual Center database, make sure you do a backup prior to the upgrade.  We chose to “leap frog” into our new environment, so we built the new Virtual Center server from ground up then disconnected the ESX hosts out of the old into the new.
Virtual Hardware

So what is virtual hardware anyways and why do I care?  Virtual hardware is an important component of your infrastructure and you should understand what it means to you.  You must be running version 7 to leverage some of the new features you will find in vSphere like the paravirtual storage driver (pvSCSI) and the paravirtual network driver (VMXNET3).  Here is the technical definition straight out of the admin guide.

The hardware version of a virtual machine indicates the lower-level virtual hardware features supported by the virtual machine, such as BIOS, number of virtual slots, maximum number of CPUs, maximum memory configuration, and other characteristics typical to hardware.  Virtual machines with hardware versions lower than 4 can run on ESX/ESXi 4.x hosts but have reduced performance and capabilities. In particular, you cannot add or remove virtual devices on virtual machines with hardware versions lower than 4 when they reside on an ESX/ESXi 4.x host.

Here is a table that lists what each version of the virtual hardware can support and what limitations you might experience:

image

Get your Groove on

Here are the steps that I take in the video to upgrade the Windows virtual hardware version from 4 to 7.  Many thanks to Scott Lowe for posting these upgrade instructions to his blog, it helped our efforts tremendously as we were early adopters of vSphere.  Don’t forget to upgrade your templates so all of your future virtual machines you implement will be running version 7.

  1. Upgrade your VMware tools in the guest operating system.
  2. Once the upgrade is complete, shut the guest operating system down.
  3. Upgrade the virtual machine hardware.  (Right click virtual machine, upgrade)
  4. Add the new VMXNET3 network adapter. (now an option)
  5. Remove the old network adapter.
  6. Power on the virtual machine.
  7. Let the hardware discovery execute and add the new devices.
  8. Reboot the system.
  9. Finished.

Matt Costa is the featured artist here, the song is titled “Sweet Rose”.  Enjoy!!

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Performance Troubleshooting VMware vSphere – Memory

memory

Introduction

As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands.  Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems.  From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory.  VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.

With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory.  This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration.  I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw.  The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed.  A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.

Memory Technologies in VMware vSphere

There are some major benefits of virtualization when it comes to memory.  VMware implements some sophisticated and unique ways of maximizing physical memory workloads within an ESX host.  All of these features work out of the box with no advanced configuration necessary.  To understand problems that might occur in your environment you need to be familiar with these basic memory concepts.

  • Transparent Page Sharing – The VMkernel will compare physical memory pages to find duplicates, then free up this redundant space and replaces it with a pointer.  If multiple operating systems are running on one physical host, why should you load the same files multiple times?  Think of this as the data de-duplication process we are seeing in a majority of backup solutions in the industry.
  • Memory Overcommitment – The act of assigning more memory to powered on virtual machines than the physical server has available.  This allows for virtual machines that have heavier memory demands to utilize the memory that is not actively being used on under utilized machines.
  • Memory Overhead - Once a virtual machine is powered on the ESX host reserves memory for the the normal operations of VMware infrastructure.  This memory can’t be used for swapping or ballooning, and is reserved for the system.
  • Memory Balloon Driver – When VMware tools are installed on a virtual machine they provide device drivers into the host virtualization layer, from within the guest operating system.  Part of this package that is installed is the balloon driver or “vmmemctl” which can be observed inside the guest.  The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it’s no longer valuable to the operating system.  If the Physical ESX server begins to run low on memory it will grow the balloon driver to reclaim memory from the guest.  This process reduces the chance that the physical ESX host will begin to swap, which you will cause performance degradation.  Here is an illustration if ballooning in ESX:

image

What to look for
  • Check ESX host swapping.  If you are overcommitting memory on the physical ESX host you can run into a situation when each virtual machine is in need of the total amount of what is granted.  When the host is out of memory it will begin to page out.  Keep an eye on your oversubscription rates of physical hosts, or ensure you have enough memory resources across your DRS clusters so it can balance the load more effectively.  Swapping will occur when the following formula is met:

Total_active_memory > (Memory_Capacity – Memory_Overhead) + Total_balloonable_memory + Page_sharing_savings

  • Check for Virtual machine swapping.  Make sure you virtual machines have enough memory for the application workload that they are supporting.  If virtual machine swapping starts to occur this can put a strain on the disk subsystem.
  • Check to ensure VMware tools are installed and updated.  VMware tools not only provides drivers from the guest to the hypervisor, but the balloon driver also gets installed with VMware tools.  For proper memory management the ESX host relies on the balloon driver to manage memory.
  • Check memory reservation settings.  By default VMware ESX dynamically tries to reclaim memory when not needed.  There are situations when you might choose to utilize memory reservations.  If you set memory reservations in your environment be aware that this memory is permanently assigned to the host and can not be reallocated when it’s not being used.  Don’t sell the balloon driver short, many third part application vendors over spec their configurations for personal safety, and ballooning can help counteract some of that wasted “fluff factor”.
Monitoring with Virtual Center

The first place I would start with checking memory configurations is Virtual Center.  Virtual Center provides excellent reporting and gives you granular control over which metrics you would like to report against.  VMware vSphere now includes a nice graphical summary in the performance tab of the physical host.  This gives you a quick dashboard type view of the overall health of the system over a 24 hour period.  Here are some memory samples:

Check your over all % usage (lower is better)

image Check your Ballooning (lower is better)

image

Selecting the advance tab gives you a much more granular way of viewing performance data.  At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information.  Here is a snapshot of memory utilization with many of the variables we just discussed above, great snapshot of what’s going on (looks healthy below):

Check your various metrics, mainly for swapping activity

image The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host.  Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked and forgotten about.  While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning.  Setup Memory alerts so you will be notified via e-mail if a problem starts to manifest itself.  Here is an alarm configured to trigger if physical host Memory usage is above 90% for 5 minutes or greater.  A lot of these alerts are built into Virtual Center so you don’t have to do a lot of pre-configuration work.  You do need to make sure you setup the e-mail notifications under the “Actions Tab”.

image

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host.  Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing.  SSH to one of your ESX servers and execute the command “esxtop”.  The default screen that you should see is the CPU screen, if you need to monitor memory select the “m” key.  Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”.  Check your total Physical memory here, make sure you aren’t over committing and causing swapping.  Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

image

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool.  If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up.  You can monitor the same metrics found in Virtual Center and esxtop here.  Just another way of getting at the data especially if you have a background in Microsoft Windows and are familiar with perfmon.

image

Monitoring with PowerCLI

Another great place to go to for finding potential memory problems and bottlenecks is PowerCLI.  I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf.  If your not a command line guru don’t let this discourage you.  PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts.  Want to find out what your ESX host service console memory is set to?  How about virtual machines that have memory reservations, shares or limits configured?  You can pull all of this information using Alan’s powerpack.

image

Conclusion

If your using VMware vSphere, there are many different ways to monitor for memory problems.  The Virtual Center database is the first place you should start.  Check your physical host memory conditions, then work your way down the stack to the virtual machine(s) that might be indicating a problem.  Take a look at esxtop, check some of the key metrics that we discussed above.

Look for the outliers in your environment.  If something doesn’t look right, that’s probably the case.  Scratch away at the surface and see if something pops up.  Use all possible tools available to you like PowerCLI.  Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of.  If all else fails, engage VMware support and open a service request.  Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Performance Troubleshooting VMware vSphere – The Tetralogy

steth

Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point.  I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools.  I will admit, this was a challenge for me to write up.  There is so much material and information that I had to really focus on keeping it simple and to the point.  Performance problems can span such a wide array of possibilities that there is never typically one easy answer.  Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.

There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic.  Since this is such a broad topic, I wanted to try and set the stage.  The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation.   This page will be the launch point for the various categories.  Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.

Performance Troubleshooting VMware vSphere – CPU

Performance Troubleshooting VMware vSphere – Memory

Performance Troubleshooting VMware vSphere – Storage

Performance Troubleshooting VMware vSphere – Network

There is one last I/O component that I will not be covering, and that is the human factor.  These posts will assume that your installation or upgrade is of sound mind and body.  If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.

Acknowledgments/References:

VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg

Performance Monitoring and Analysis – Scott Drummonds

VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja

http://www.vmware.com/support/

http://www.yellow-bricks.com/esxtop/ – Duncan Epping

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Performance Troubleshooting VMware vSphere – CPU

pentiumee_processor_back

Introduction

Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures.  Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.

What to look for
  • Check for physical cpu utilization that is consistently above 80-90%.  Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server.  Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
  • Watch pCPU0 on non ESXi hosts.  If pCPU0 is consistently saturated, this will negatively impact performance of the overall system.  If you are using third party agents, ensure they are functioning properly.  A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS.  All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
  • Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage).  This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
  • Watch for virtual machines that are consistently at 80-100% utilization.  This is not a typical pattern of a conventional server.  Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles.  I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines!  If something doesn’t look right, it’s worth checking it out.
  • Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center.  I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor.  After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP.  If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center

Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level.  Before getting into too much detail I wanted to explain Virtual Center statistics logging.  There are various levels of logging that can be set for the VC database.  Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting.  Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed).  If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here.  To change these settings click, Administration –> vCenter Server Settings –> Statistics.

image

Let’s take a look at a physical ESX host performance metrics through Virtual Center.  vSphere now includes a nice graphical summary in the performance tab of the physical host.  This gives you a quick dashboard type view of the overall health of the system over a 24 hour period.  Here is the CPU sample:

image Selecting the advance tab gives you a much more granular way of viewing performance data.  At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information.  Here is a snapshot of physical CPU utilization across all processors:

image

The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host.  Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked.  While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning.  Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself.  Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.

image

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host.  Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing.  SSH to one of your ESX servers and execute the command “esxtop”.  The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard.  Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”.  Check your PCPU and CCPU (Physical/Console) here.  Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

imageA detailed list of ESXTOP counters can be found here:

http://communities.vmware.com/docs/DOC-5240

http://communities.vmware.com/docs/DOC-9279

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool.  If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up.  Check your % Processor time which is the current load of the virtual processor.

image

Monitoring with PowerCLI

Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI.  I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf.  If your not a command line guru don’t let this discourage you.  PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts.  Want to find virtual machines with CPU ready time?  How about virtual machines that have CPU reservations, shares or limits configured?  You can pull all of this information using Alan’s powerpack.

image

Conclusion

If your using VMware vSphere, there are many different ways to monitor for CPU problems.  The Virtual Center database is the first place you should start.  Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem.  Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.

Look for the outliers in your environment.  If something doesn’t look right, that’s probably the case.  Scratch away at the surface and see if something pops up.  Use all possible tools available to you like PowerCLI.  Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of.  If all else fails, engage VMware support and open a service request.  Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VMware Data Recovery (vDR) Overview

image

Overview

I like to try and save my employer money when possible.  I am of the opinion I would be doing them a disservice if I didn’t examine and evaluate a product that we paid for.  Our company decided to take the plunge and upgrade all of our licensing to vSphere Enterprise Plus.  There is a new backup/data protection product that was introduced with this recent release.  Here is the technical definition of VMware Data Recovery from the administration guide:

VMware® Data Recovery creates backups of virtual machines without interrupting their use or the data and services they provide. Data Recovery manages existing backups, removing backups as they become older. It
also supports deduplication to remove redundant data.

Data Recovery is built on the VMware vStorage API for Data Protection. It is integrated with VMware vCenter Server, allowing you to centralize the scheduling of backup jobs. Integration with vCenter Server also enables virtual machines to be backed up, even when they are moved using VMware VMotion™ or VMware
Distributed Resource Scheduler (DRS).

Sounds pretty good right?  You get a backup application (with de-dup!) that could possibly displace your primary method of backups built specifically for VMware?  I did a little digging in the community and was disappointed to learn that vDR is not exactly an enterprise product.  A lot of the feedback from other VMware engineers was “it’s a 1.0 product and is designed for a small installations”.  The maximum supported virtual machine backup configuration is 100 virtual machines.

I decide to check it out for myself and see if it was a fit for our environment and might possibly alleviate some of our backup problems.  Our primary site is rather large, but we are now implementing vSphere at our smaller satellite locations and this might be a fit for a smaller office configuration.

Installation

The installation was quite easy, VMware has provided another great virtual appliance that can be downloaded from their website.  After you import the virtual appliance via Virtual center and assign the host a static IP address, you then need to install the VC plug-in so you can manage your newly installed appliance.

image

After I ran through the installation and configuration I was disappointed to discover that I couldn’t get VDR to launch.  I kept getting prompted for authentication credentials which was odd.  I thought maybe I had incorrectly set something up so I went back and reviewed the administration guide.  Upon closer examination (RTFM) I discovered that vDR doesn’t support Virtual Center running in linked mode.  To my dismay, we are running in a VC linked mode in anticipation of a Site Recovery Manager implementation.  Our remote sites are managed by our primary site Virtual Center to save costs.  I discovered a work around by pointing the VC client to the ESX server that is managing the vDR appliance.  This would only give me access to backup other virtual machines hosted on the same ESX host so I could continue my testing.  I hope this is something that future versions of the product will address and fix.

The Console

Once you launch the vDR console, you are immediately prompted by a configuration wizard to begin setting up your environment.  Here are the following steps in the order they are presented:

  1. Select your Virtual Machines to backup.
  2. Select your destination (CIFS share, attached vmdk, or RDM).
  3. Select your backup window.
  4. Select your retention policy.

All of these are straight forward and don’t require much discussion.  The only step I found a little confusing was the retention policy.  Personally I would have preferred something a little more technical than “Few/More/Many”.

image The retention Policy radio buttons are pre-defined settings and will change the policy details below.  Change the buttons and you can see the variables change and what each setting will mean in terms of your destination data storage.  Use caution here as each vDR appliance can only support up to 1TB in data store size, with a maximum of two stores.

The Backup

The underlying backup technology behind vDR is the new vStorage API (Not VCB), it takes advantage of a new feature called change block tracking.  After the first full backup is performed, Change block tracking examines the virtual disk being backed up and only backs up the differences from the first backup.  This means less backup traffic going across your network.

I selected a CIFS/Windows share at our disaster recovery site to perform the backup testing.  The test share was a ~600GB, 5+1 (10K) of locally attached SCSI storage on a HP DL380.  I selected a couple of Windows virtual machines to test with and kicked off the backup jobs.  Below is a screenshot of the reporting window for vDR (sorry for all the censorship).

image The jobs seemed to run pretty slow in general, but completed successfully without errors (the error listed above is because of my linked virtual center configuration).  In my opinion the reporting interface is lacking some details.  I would have liked to have seen what throughput I was getting during the backups.  The only way I could see the throughput was by monitoring the windows host that was housing the data store.  I would have liked a more detailed task status, so I could tell what was going on through out the backup operation.  Data de-duplication ratio would have been another great detail to see.  This could help determine the total backup and estimated completion time of each virtual machine, which is another variable I found to be missing.

The Restore

There are two approaches to restoring data using vDR, the first method is a full system restore.  This method will recover the entire virtual machine, system state and all corresponding data.  When performing a full system restore you can restore the data to an alternate esx host, data store, and decide if you wish the network interface connected or not.  I even found that you can change the virtual disk node, and select an alternate SCSI path to recover your disk path too.

image

The second option is a file level restore (FLR), which typically most people would tend use on a more regular basis.  Unfortunately the vDR console can’t recover individual files without some additional configuration.  You need to install “VMwareRestoreClient.exe” executable on a virtual machine, which then will give you the ability to browse your data store contents and select individual files to recover.  I anticipate that we will see the FLR components get rolled into the vDR console in a future release.

Conclusion

VMware Data Recovery lacks a lot of critical pieces that an enterprise backup application should and needs to provide.  The product is a great start for smaller VMware implementations, but even at that I could see it quickly being outgrown.  Here are the areas I would love to see improved on in future releases of the product:

  • Need support for linked Virtual Center’s.  Personally I could use this product at some of our smaller locations but can’t leverage vDR since we are running in a linked mode.
  • Need to support larger capacity of virtual machines.  100 virtual machines is not enough, the product needs to scale to support a larger VMware implementation (not necessarily Enterprise).
  • Need support for larger data stores.  1TB is not a lot of space when you are going to be backing multiple virtual machines up and retaining their data for longer periods of time.
  • Need support for more data stores per vDR appliance.  Again this goes back to scale, storage growth is exponential in our current environment.
  • Support for a global vDR manager.  I would love to see VMware develop a central master or parent vDR console that would allow you to manage your children appliances, and the data stores that they manage.
  • Single console for both full system restores and file level restores.

VMware Data Recovery comes with all versions of VMware vSphere except for vSphere standard.  This is a great entry level backup solution with de-duplication included.  I am excited to see the product develop into a more mature product that can scale with some bigger environments.  I also feel that including vDR in the standard version of vSphere would only help the SMB market embrace virtualization at a higher adoption rate.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VMware vSphere Capacity IQ Overview – I’m Impressed!

ciq-icon

With the launch of VMware vSphere came some new products that I hadn’t really paid much attention to (busy upgrading I guess).  One of the newer products is a Virtual Center reporting tool called Capacity IQ.  This product  gives an administrator the ability to analyze, forecast and plan for future growth across your ESX environment.  I have had a lot of experience with monitoring/reporting tools in the past, I won’t bore you with the details, so I was quite skeptical of a 1.0 reporting tool for Virtual Center.  I must admit I was blown away by the immediate relevant reports the product was able to produce.

After pulling down the trial install and obtaining the demo key, I loaded it up for a spin.  I am not going to document the installation steps needed as Eric Gray has done this for us already.  It by far is the easiest reporting application I have ever installed.  If your interested in taking it for a trial run, download the virtual appliance from VMware’s website here (OVF format).  Once you import the virtual appliance and give it a static IP address, it will need to collect data about your environment for a while.

There are three basic views that CIQ gives you once you install the plug-in, dashboard, views and reports.

Dashboard

The dashboard tab is designed to give you a quick overview of the item you have selected.  Capacity IQ uses the same approach as virtual center does, whatever object you have selected will be reported and focused on.  Here is a view of one of our clusters, notice January 11th on the Trend and Forecast graph on top.

Dashboard

One of our clusters was out of resources, I added two more physical hosts to the cluster.  You can see CIQ picks up the new physical host resources for the cluster and reflects this by increasing the number of virtual machines it believes the cluster can accommodate.  Want to see something even more interesting, check out the pink graph on the 17th.  Capacity IQ is already using a prebuilt formula to assume what it thinks we will have (or won’t have) a week out.  Pretty impressive.

Views

The views tab is designed  to give you a more detailed look on some of the specific data points.  Here is a screenshot of the various reports you can execute:

Views

So here is where you can get some great visual reports to present to either upper management, or a potential customer.  This gives you a nice interface that you can customize with data points that you can tweak.  Check out the first report on this cluster:

image

This gives you a graphical historical view of your cluster, how many virtual machines you have added over the course of time.  Notice the horizontal sliding bar at the bottom of the chart.  This allows you to adjust your variable time/date window.  The lighter shaded line to the right is the projected or forecasted growth of how the cluster might continue to grow.  The views tab is a great place to run some ad-hoc reports, gives you the ability to select the type of report, and even allows you to export the data.

Reports

The reports tab is the “pre-canned” reports that can be executed by the administrator.  The one thing I was disappointed to not see here was the ability to schedule these reports to run at a particular interval (weekly/monthly).  This is something that I assume will probably be introduced in future releases of the product.

Reports

After the report is executed and compiled, you are then provided with a .pdf or .csv version of your dataset to download and review.  The first report totaled 17 pages and provided some great technical information.  Here is the table of contents:

image

Conclusion

I am very impressed with Capacity IQ.  There are no agents you need to install across the virtual machines you wish to report against.  The installation was very straight forward, I think I had it up and running in about 15 minutes.  Once the virtual appliance was in place, all it needed was a little bit of time to start crunching some data.  The reports are well written and very relevant to what an administrator would desire and wish to see.  If your looking for a nice reporting tool to help you forecast, give this one a test to see if it fits your needs.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

vSphere 4 Update 1 with Update Manager Shenanigans

New Year Lights 2010

Happy New Year!  I hope everyone enjoyed the holiday’s and got to spend some time with friends and family.  If your reading this I suggest you pay tribute to the quality of Virtual Insanity, and give the gift of voting.  Eric Siebert has released a “best of 2009 blog contest”.  If Virtual Insanity has helped you out in some way in the past I suggest casting a vote for this great virtualization blog space!  Ok onto the real reason for this post…

I ran into an oddity while bringing a new host online today into our vSphere environment.  And thought it best to publish my findings.  Hopefully this might save someone a support call.  With vSphere 4 update 1 came a couple of technical issues, which are detailed here and here.  Personally we don’t use ESXi so only the first one was a major issue for us.  We are an HP shop, so the issue around the HP agents and update 1 was a major concern (basically would render the host unbootable).  Luckily VMware support is proactive about announcing issues like this to the community and most people were aware of the problem right away.

The problem I hit today was strange and I thought it was just being off from work for a week.  I went to apply our update 1 baseline to a new host I was bringing up, rescanned, and then got this:

compliant1

What the?  I know this isn’t compliant, our base build is still at 4.0  Check out the build number, that’s proof.  I have used the update 1 baseline for 50+ hosts so I know it’s not that.  So maybe update manager is still on holiday as well, I restart the service and life is good?  Nope.  Same thing.

To make a long story longer, I poke around in the repository and check the update 1 patch and see it’s valid, yep 11/19/09 that’s the right release date.  Why is this thing not working?

update1-first

I kept poking and prodding thinking maybe they released an update to the update?  Sure enough it slipped by me when I wasn’t looking, or it went to my spam mail.  Check the date 12/9/09.

update1-second

I created a new test baseline, and dropped the 12/9/09 update 1 into it and applied it to my new host.  Low and behold:

compliant2

That’s much better.  Strange the older update 1 patch didn’t reflect anything and showed compliant.  As an end user I would have liked to have seen some type of error message, or a reference to the newer released update 1.  Ran the new update, (still stopped the HP agents just in case).  And now things look good again (build number):

looksbetter

Conclusion

Go vote for this site, and make sure you update your update manager, update 1 baseline.  That’s a lot of updates.  See you online!

Scott

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

More Bang for your Buck with PVSCSI (Part 2)

Part 2 Doing the work

As you might have noticed, this blog post is a continuation to my first post about PVSCSI, you can access Part 1 here.

Hopefully now you have a better understanding of what the Paravirtual SCSI driver is all about, and we can prove there are tangible reasons to move in this direction.  Let’s get on with the important part, the implementation phase.

SCSI2

(I need to finish off this blog post, I am running out of pictures of SCSI cables)

There are some caveats I need to start out with.  In case you missed it, PVSCSI drivers on virtual machines aren’t supported on operating system disks unless you are running vSphere 4 update 1.  You can use the driver on a secondary data disk if you so desire, but for this post I am going to assume you are running vSphere 4 update 1 (Virtual Center and ESX Hosts) and want to know how to get the driver working on all disks.

In most cases, it’s always easier to build new.  You know you have a clean install, the drivers are updated, the configuration is solid.  I would suggest updating your templates to include the new paravirtual scsi driver.  Your existing virtual machines run fine with their existing configurations, and depending on your environment, it might be a lot of work to go back and target all of your virtual machines.  For an upgrade path, my personal opinion would be to target your heavy I/O virtual machines.  Upgrade the VM’s that will make a difference, and you will see some immediate benefits.  Reducing the I/O on the disk subsystem will only benefit the other virtual machines that might share those same physical disk spindles.

Clean install

This section will walk you through the process of installing the driver with a Microsoft Windows 2003/2008 operating system.  Currently these two operating systems are the only ones supported.  Hopefully we will see some added support for the various Linux operating systems down the road.

Walk through the “New virtual machine Wizard” as you normally would.  On step 9, ensure you select the “VMware Paravirtual” option as seen below.

para_wiz

Before powering your new VM up, you need to connect the virtual floppy image file that has the driver for your desired guest operating system.  This is not on the VMware.com website under downloads, it already exists on your ESX host.  You will need to browse to the following location on your ESX host. [Datastores]\vmimages\floppies I would wait to connect your floppy disk image after you boot off the Windows CD-ROM so it doesn’t try to boot off the floppy drive.

pvscsi-flop

When you power up your new virtual machine, select the F6 option to tell the operating system you need to use a third party SCSI driver:

windowsf6

Now connect your floppy disk image to your virtual machine under the “edit settings” option.  You should now be able to point to operating system to the driver as seen below:

pvscsi_select

Continue on with your normal installation, and you are complete.  Your new virtual machine is now utilizing the Paravirtual SCSI drivers.  I suggest now converting this image you created to a template for future deployments with this configuration.

Upgrading and Existing Virtual Machine

To upgrade an existing virtual machine, the process is pretty straight forward.  Assuming you have already upgraded to the latest virtual hardware (Version 7), make sure your VMtools are upgraded post Update 1.  Shut down the VM, and edit the settings “Change Type” as shown below:

chng1-pvscsi

You will get another window that will alllow you to change the type of controller as seen below:

chng2-pvscsi

Select the “VMware Paravirtual” and then select ok.  Boot up your virtual machine and you are all set.  Your system is now running with the updated drivers and you can take advantage of the newer drivers that provide better throughput and less latency!

Hope you found this post useful.  Good luck!

Scott

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

More Bang for Your Buck with PVSCSI (Part 1)

One of the new features that was added to the release of VMWare vSphere 4.0 was a new SCSI subsystem driver that allows more I/O and less latency per virtual machine.  What the heck is PVSCSI?  Here is the technical definition stripped right from the vSphere storage guide. (RTFM).

scsicable

VMware Paravirtualized SCSI (PVSCSI) is a special purpose driver for high-performance storage adapters that offer greater throughput and lower CPU utilization for virtual machines. They are best suited for environments in which guest applications are very I/O intensive. VMware requires that you create a primary adapter for use with a disk that will host the system software (boot disk) and a separate PVSCSI adapter for the disk that will store user data, such as a database.  The primary adapter will be the default for the guest operating system on the virtual machine. For example, a virtual machine with Microsoft Windows 2008 guest operating systems, LSI Logic is the default primary adapter. The PVSCSI driver is similar to vmxnet in that it is an enhanced and optimized special purpose driver for VM traffic and works with only certain Guest OS verision that currently include Windows Server 2003, 2008 and RHEl 5. It can also be shared by multiple VMs running on a single ESX, unlike the VMDirectPath I/O which will dedicate a single adaptor to a single VM.”

So what does all that mean for you?  Better disk performance and less CPU cycles spent on processing these disk requests.  I took some notes at VMWorld 2009 during a few different sessions that discussed PVSCSI.  Here is my logical diagram of what PVSCSI is.  Download the PDF version here so you can print it out and frame it on your cube wall!

Sauer_PVSCSI.pdf

PVSCSI

With the release of vSphere 4.0 PVSCSI was only supported on disks other than the operating system (secondary data drives).  For more information on this, reference KB Article: 1010398.

vSphere 4 update 1 is now released and it’s exciting news for those looking at utilizing PVSCSI.  Support for boot disk devices attached to a Paravirtualized SCSI ( PVSCSI) adapter has been added for Windows 2003 and 2008 guest operating systems.

So let’s first find out if it’s all that.  We need to do some testing to validate the hype.  I created two virtual machines, one with the traditional LSI Logic SCSI driver, and one with the new PVSCSI driver.  The host is the same for each VM, 4 socket Intel Xeon system with 64 GB of RAM, connected to EMC Clariion CX3-80 storage.  The Raid configuration is a 4+1 RAID 5 set (10K spindles), with the default Clariion Active/Passive MRU setup (No PPVE).  Each VM has 2 vCPU’s and 4 GB of RAM and both are running 32 bit Microsoft Windows 2003 R2.  Both Virtual Machines data disks were formatted using diskpart and the tracks were correctly aligned.  Anti-virus real time scanning was disabled on both systems.  This test is meant to get as close as possible to a standard configuration that we can benchmark from.

I used IOMETER as my testing engine.  I didn’t go too deep on the various settings.  The first run is 32K 50%R 0%W.

Non-PVSCSI

no-pvscsi

With-PVSCSI

with-pvscsi

Quite the difference, no?  To be honest, I was seeing a lot of fluctuation while doing my tests.  I probably should have segregated things out a little more, but the screen captures were the average of the results.  I was thinking maybe I should use the built-in random IOmeter combined results.  So here you go.

Non-PVSCSI

no2-pvscsi

With-PVSCSI

with2-pvscsi

I believe the results speak for themselves.  I need to do a little more testing for my own personal preferences.  I want to get a more insight on what the differences are on the reads/writes and the various sizes.  I am certain cache has a lot to do with the results, but I think IOmeter can bypass cache since you force the randomizer.  I’m also curious about the sweet spot on the block sizes and how that plays out with read vs write.

Conclusion

PVSCSI is a technology worth moving towards.  There is no cost involved, and it can deliver better disk performance across your ESX environment.  It also can bring your host CPU utilization down, which can provide you with better consolidation ratios across your clusters.  Stay tuned for part 2, when I am going to provide the “how to do it” aspect so you can begin to leverage this technology you are already paying for!

Hope you found this information helpful.  Thanks goes out to Aaron Sweemer (@asweemer) for allowing me to abuse his website and not having to deal with bringing up my own site.

Thanks!

Scott Sauer

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Capacity Conundrum Part Deux

 

– The vTrooper Report –

 

This is a continuation of the Capacity Conundrum, if you missed the first part start here.

$ per Compute VM

So let’s cut to the chase.  In the case of the compute tiles of our Quad we have a price per vCPU and $ per GB of RAM to settle. Keeping our example 2U server in play we could expect to spend approximately $15,000 for a 2U fully loaded with 4GB DIMMs.   Well unfortunately a small part of that 15K  is consumed in I/O cards and maintenance which needs to be pulled out to get the compute number.  For our argument we will use $10K for the compute system without the I/O cards and maint. costs;  This is the CAPEX we will offset in our $/per values.

vCOMPUTE – FIREPOWAH!

We know how a CPU works right? Move process into memory , execute CPU cycles, churn, churn more, back to the I/O guys, rinse and repeat.  Basically, this is where the hardware container happens in our data centers.   I say container because it’s easy to show it as a box; It’s hard to define what it will always be in physical form.  1U, 2U, 4U, half blade, Full Blade, appliance , PC ; you name it, it is probably in some one’s ‘datacenter’.  The lowest common denominator I have been able to settle on for a common form factor is Cores per Ram.  Grouping per socket fits because you are measuring the type of memory that is close to the CPU socket.  The NUMA architectures of AMD and Intel with memory controllers on-board and transports to the memory DIMMs without access through the I/O  controllers (eg. Northbridge) help define the grouping.

TECHNOTE:  Every core has associated memory banks it will use and every container(physical server) has a series of sockets that it controls.   A hypervisor has a limit to how well it can control the associated memory space to the nearest vCPU.   Generally the hypervisor will always schedule available vCPU’s from the same socket and swap the corresponding memory for those processes to the memory banks of the corresponding socket.  It does this is for efficiencies of the x86 architectures.  It can move the vm to another socket and readdress the memory but it has a ‘cost’ associated with such a move.  Path of least resistance is to stay in the same socket.

If you create a 4 vCPU VM and run it on a 1 core  processor it gets bogged down.   If you have the same VM on a two socket Quad Core (8 Cores)  the four cores utilized by the VM are likely to be on socket 1 or socket 2 .  The cost of splitting the vCPU between the two physical sockets by the scheduler is greater than running the vCPU in the same socket.    AMD delivered this earlier than Intel and sustained higher levels of virtualization consolidation “Per Host” than similar class systems of Intel could provide through the Northbridge.   Core i7 is a new game for Intel and the results of Nehalem show the improvements.

For more indepth information here is a good read:  CPU Scheduler in VMware ESX

We have a host of $10K  CapX charge that has two sockets at a 4/45GB Socket Ratio with approx $5k spend in each socket.  Looking at our Hardware invoice the CPU Cores are about 25% of the cost of a socket so we can assume that our per socket cost is broken down into 25% Core and 75% Memory.   So our Socket Ratio yields a $1250 cost for 4 cores and $3750 for 45GB of memory:

Per Core CapX = $312.50;  Per GB RAM CapX = $83.33

That gives us a bare metal cost without a hypervisor charge on top, but we need a hypervisor to get a VM running.  Adding in the ESX cost for a per socket license of ESX Enterprise Plus (worst case) you can add $3500 each socket.

ESX Lic. Cost per socket CapX = $3500

Raw burn rate of the host would be $8500 per socket if we never loaded a VM on the Host.  Well, we did it for a reason, so let’s get our money back. If we target the standard allocation for this host (4/45GB socket ratio) we get our target VM count of 16 per socket(1 vCPU/2.8 GB RAM).  Also, keep in mind that we broke the socket cost down by 25%  to CPU and 75% to Memory so we will keep that  same  split here.  If we don’t do the split, then any VM that is deployed to the socket will bear the same cost regardless of its size.

ESX Lic. Cost per VM= $218.75   ( 3500 / 16 )

-Or-

Split by the 25/75 % we did previously for the cost of the CPU and Memory and you get a little different calculation.

3500 * .25 = 875 / 16  = $55 AND     3500 *.75 = 2625 / 45  = $58

per vCPU=$55

per vMEM=$58

Adding it up with our target ratios in tow we get the burn rate of the $ per Compute on a VM basis.

($312/4  = 4:1 ratio) + (83*2.8) + {(55*1) + (58 * 2.8) } = $530

Or Summarized: (vCPU = $78)+(vMEM = $233)+(Hyper$=219)=(vCOMPUTE = $530)

Assuming 8760 Hours (1 year) this VM would cost $.06/hr in vCOMPUTE.

Lets apply that to some other VM systems and see if it sticks.  If we plan for the following VM deployment on our socket:

vmGrid

The costs would spit out as such:

vCompute

Or slice it up into a per hour number:

perhour

So based on this analysis some of my VM’s probably only cost $.05 per hour for vCompute.  Interesting. What is more interesting is the fact that the memory cost associated with a VM scales more accurately to the consumption.  You can have as much memory you like for your new 4 and 8 GB aspirations; (eg. memory leaks) you just need to pay for it accordingly.

Too bad that only pays for the top part of my total cost model.  That said, the benefit here is that this model can span across hypervisors and any market hypervisor can be split up to show the cost of a VM consumed on a Xen , KVM, VirtualIron, Parallels’, or Hyper-V infrastructure.

I will be working on a few powershell scripts and excel calculators that one can use to make this model more repeatable. At the very least, it is a model that I will use to consider CapacityIQ and third party products like the offering from VKernel; and the output they measure.  Especially if they consume additional costs on a per socket basis.  Which I can now calculate as Overhead.

Alas there is more to consider, stay tuned for Part III – “the I/O that binds”

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon