Archive for the ‘Scott Sauer’ Category
It’s hard to believe that another year has flown by and Eric Siebert’s voting for the top Virtualization blogging contest is upon us once again! If you enjoy the content that you read from Virtual Insanity, I encourage you to give back to the community and vote for us!
What other site discusses great technical VMware content ranging from core ESX, Spring Source, Linchpin’s and even open’s its doors to great guest bloggers?? Thanks for reading.
A fellow VMware Engineer recommended a book to me recently titled “Linchpin” by Seth Godin. The book has nothing to do with VMware or virtualization but it hits home for me because it highlights a lot of topics that I find applicable to our industry. I could not ignore this as a relevant force that has somehow affected me, so i felt I had to write something up and share some thoughts. This post is a little more off paced from what I normally write about so bare with me. I think a lot of what Godin covers is present in the VMware community today, and many of you are already “Linchpins”. I reached out to Seth to get his permission to share some of his insights, if your interested in purchasing the book just click the link above. Here is the book synopsis:
There used to be two teams in every workplace: management and labor. Now there’s a third team, the linchpins. These people invent, lead (regardless of title), connect others, make things happen, and create order out of chaos. They figure out what to do when there’s no rule book. They delight and challenge their customers and peers. They love their work, pour their best selves into it, and turn each day into a kind of art.
Linchpins are the essential building blocks of great organizations. Like the small piece of hardware that keeps a wheel from falling off its axle, they may not be famous but they’re indispensable. And in today’s world, they get the best jobs and the most freedom. Have you ever found a shortcut that others missed? Seen a new way to resolve a conflict? Made a connection with someone others couldn’t reach? Even once? Then you have what it takes to become indispensable, by overcoming the resistance that holds people back.
Godin believes that our society has changed and we (the U.S. in this example) are no longer living in the industrial era that our parents and grandparents grew up in. We are no longer the factory-driven-widget-producing society that we once were, in fact most of these types of positions have been outsourced to cheap labor across the globe. Going through the schooling process and obtaining a piece of paper no longer guarantees you will be promised a job for the next 30 years of your life. Competition and technology have extinguished the promise of a secure job that pays well, offers health insurance, and a great retirement package when you exit.
Sound melodramatic and doom and gloom? It’s not really. Godin goes on to explain that because our society is changing, we need to also identify this and change with it. We are not cogs in a giant industrial machine. You have a mind of your own, and have more to contribute that you might think. Working off the same rule book is no longer going to apply if you want to be considered indispensible by forward thinking companies.
The old school of thought: "”Keep your head down, follow instructions, show up on time, work hard, suck it up”.
The new school of thought: “Be remarkable, be generous, create art, make judgment calls, connect people and ideas”
Become a VMware Linchpin
Here is how Wikipedia defines Art:
Art is the process or product of deliberately arranging elements in a way to affect the senses or emotions. It encompasses a diverse range of human activities, creations, and modes of expression, including music, literature, film, photography, sculpture, and paintings. The meaning of art is explored in a branch of philosophy known as aesthetics.
The new school of thought talks about becoming an artist, but don’t think of art as the class you avoided in high school. Art is creating something from nothing, it’s also about creating something that invokes an emotional area in the brain for yourself and others. Many virtualization evangelists are creating something from nothing, the VMware blogosphere is one of the best examples of this today. The VMware community is alive with passionate people that are writing and creating new content daily. Have you ever stopped to examine the VMware Planet v12n blog aggregator? It’s really quite amazing the amount of new content created around this topic of virtualization. Customers stepping up, and brining their content to local VMUG’s to share their personal experiences is another great example of creating this type of art.
Twitter is now inundated with VMware virtualization metadata. Not only can you find where this virtualization data resides but you can now make connections with people that would have been impossible to make before. There are experts in every form and fashion that are now open to communicating about all things that touch virtualization. Storage experts, systems experts, networking experts, powershell experts, and perl experts are just a few that jump out. Are you looking for a specific need that might have a benefit to others around you? Pose the question and 9 times out of 10 someone will write the code and share it with the community at large.
Maybe this is something you are already doing today, maybe it’s something your not doing and will never do. That’s fine too, I’m just some guy that read a book and sharing my two cents. I will tell you that as you start to consider some of these topics and look out at the industry in general (not just VMware) you will see this change come into play more and more.
Challenge yourself to get out of your comfort zone. Go out of your way to make new connections. Help someone out that might not be as skilled as you. Write a blog. Sign up for a Twitter account. Stand out, create art, be noticed. It will give you a sense of accomplishment, help define yourself as an expert in your field, and even open more opportunities down the road.
I come from a family of artists and I thought some of that intrinsic genetic value kind of flowed in the blood, but by my white boarding skills (illustrated above) apparently that isn’t the case. I promise I will work on my happy little trees as time permits (Bob Ross reference) and try to move away from my chicken scratch art work. I only hope that I can make both my family and Bob proud.
There has been a lot of activity here at VMware with acquisitions and partnerships over the past few months. A fellow engineer at VMware summarized a lot of these acquisitions and how they are meaningful to VMware as an organization. I wanted to share this information because I think it provides people with a better understanding of where we are going as a company and the overall strategic vision of VMware (Thanks again Andy!).
Being only three weeks into my career at VMware, I haven’t had much time to do any technical blog posts due to the fact that I have been drinking from the fire hose and trying to ramp up as quickly as possible. I am writing this post from VMware’s annual tech summit, and I joked with a few people here that I am so new that I haven’t even gotten a paycheck yet. My wife called to reassure me that I actually did get paid, so no more jokes about a “virtual paycheck” I guess.
Everything has been great so far. The people I am working with are awesome, the job is going to be fun but challenging, and to be honest I think I am working for the coolest software company in the world. VMware’s breadth of products is really quite amazing, they are literally covering the stack with many different applications, and driving a change in the industry that many people are excited about, revolutionizing IT.
“Going Green” has been a buzz word in the IT community for years but the more I deal with this topic the more I consider it a black and white issue. I never thought I would be covering an energy blog topic, but there are some real world examples I wanted to write about. Datacenters are enormous consumers of energy from the IT infrastructure itself, all the way down to the HVAC that is needed to cool these power thirsty systems. While I think green initiatives are much needed in our industry, typically large corporations don’t consider these initiatives unless there is some intrinsic value associated with them i.e. money. Business drivers outweigh the political pressures of saving the environment, and in all fairness isn’t that what a company should be about, their own salvation?
Maybe that sounds harsh to all of the eco-friendly readers out there, but don’t get me wrong I am all about saving natural resources and respecting our environment. Understanding the underlying issue of the current state of our industry is critical if one is going to offer solutions to a problem. If corporations can save operational costs on power and cooling and say they are a “green company” then we have just killed those two birds with one stone.
Networking is the fourth I/O component that I will be covering in this series of performance write ups. Networking is another important component in the stack, if not well thought out, can lead to performance problems later down the road. Security is an important design consideration when planning your network configuration. One might argue that with a virtual environment your are more prone to risks since at times there is no longer a physical cabling restriction in place. If someone has the appropriate rights in virtual center, they could bridge two logical networks together, or place a virtual machine into a DMZ. VMware introduced vShields to mitigate your virtualized environment from some of these risks. By creating zones you can enforce policies that can bridge, firewall, or isolate virtual machines between network segments. When designing or upgrading your VMware environment, work closely with your network team to understand their design considerations. If possible, leverage VLAN tagging (802.1q) to eliminate excessive physical cabling to different segments.
Personally one of the most interesting components of the VMware architecture I/O stack is storage. There are a plethora of diverse storage solutions in the industry today that offer unique different ways of addressing storage performance, as well as the increase in capacity demands. Storage problems are the most common mis-configuration effecting performance that exists in VMware today. An oversaturated LUN will effect all virtual machines that share that same data store. Take this concept up a level, a group of disks (RAID group) that are saturated with I/O will negatively impact all LUNS that share those same physical spindles. Storage traditionally has been the “red headed step child” in VMware and hasn’t gotten a lot of visibility. Storage I/O bottlenecks can create serious virtual machine problems and yet it wasn’t until ESX 3.x that graphic visibility was even displayed to VMware administrators, see 2.x MUI reflects CPU and memory (Management User Interface for those newer to VMware).
I attended last months Cincinnati VMUG (VMware User Group) and was surprised to hear the responses from the audience on how many customers had not taken the plunge, and upgraded to vSphere yet. I think there were a handful of users that had just completed the upgrade. Sometimes I forget to step out of my own personal space and consider what others have going on in their own environments. If your still wondering about the upgrade, Aaron has a post on some of the benefits of going from VI3 to vSphere.
Part of the process of upgrading your existing investment is the need to upgrade all of the virtual machines to the latest and greatest virtual machine hardware version 7. Someone mentioned to me how much of a pain this was since you had to touch each virtual machine, and my response to them was “It only takes a couple of minutes”. I wanted to prove this theory in a different way, so I mulled over it and came up with a timed video clip. The song I chose is 2 minutes and 39 seconds, so I figured If I can knock this out within the amount of time it takes for the song to play, well then, mission accomplished. Read the rest of this entry »
A co-worker asked me yesterday if I knew of a way to find out who was watching your console session inside Virtual Center. I wasn’t quite sure what he meant by this at first. But after doing some digging I discovered that yes you can find out who is watching your console session. Don’t forget that security permissions that are setup correctly will eliminate these snoopers from even getting to the console in Virtual Center in the first place.
As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.
With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.
Yes a bold topic I know, but I wanted to tackle this subject because it’s such an important aspect that everyone typically deals with at some point. I also find it personally useful to document some of my thoughts so I can solidify my own understanding of these processes and tools. I will admit, this was a challenge for me to write up. There is so much material and information that I had to really focus on keeping it simple and to the point. Performance problems can span such a wide array of possibilities that there is never typically one easy answer. Hopefully by highlighting some of the tools that are available for use, and offering some of my personal thoughts and experiences, I might be able to help when problems arise in your infrastructure.
There is so much useful information floating around on PDF’s, blog’s, websites, PowerPoint decks, that one could easily get consumed by this topic. Since this is such a broad topic, I wanted to try and set the stage. The focus of this series of blog posts is to highlight some key components to examine, and then provide tools that will give you insight into your own environment and/or situation. This page will be the launch point for the various categories. Each blog post will cover a different category relating to the possible points of I/O in a VMware ESX environment.
There is one last I/O component that I will not be covering, and that is the human factor. These posts will assume that your installation or upgrade is of sound mind and body. If there are underlying installation issues or post upgrade issues, I suggest engaging VMware support before examining conventional performance problems.
VMware vSphere 4 Performance Troubleshooting Guide – Hal Rosenberg
Performance Monitoring and Analysis – Scott Drummonds
VMworld 2009 TA2963 ESXtop for Advanced Users – Krishna Raj Raja
http://www.yellow-bricks.com/esxtop/ – Duncan Epping
Processors have come a long way in a very short time, and over the past few years we have seen the industry embrace the multi-core x86 architectures (Intel and AMD) which is allowing us to consolidate with even greater efficiencies than previous processor architectures. Ensuring available compute cycles to virtual machine workloads is critical, and should be monitored closely as you scale out your infrastructure.
What to look for
- Check for physical cpu utilization that is consistently above 80-90%. Getting high consolidation rates is a wonderful thing, but don’t over tax the physical server. Maybe it’s time to purchase another host for your DRS cluster and let the software balance your workloads better.
- Watch pCPU0 on non ESXi hosts. If pCPU0 is consistently saturated, this will negatively impact performance of the overall system. If you are using third party agents, ensure they are functioning properly. A couple of years ago we had issues with HP System Insight management agents (Pegasus process) which was creating a heavy load on our COS. All of the virtual machines looked fine from a performance perspective, but once we dug a little bit deeper, we discovered this was our root cause.
- Watch for high CPU ready times, this indicates that the processor is waiting on other I/O components on the host before it can perform its computations (Memory/Network/Storage). This can help point you towards another possible bottleneck in your infrastructure outside of CPU.
- Watch for virtual machines that are consistently at 80-100% utilization. This is not a typical pattern of a conventional server. Most likely if you login to the guest you will find a runaway process that is consuming all of the cpu cycles. I actually found an offshore contractor running Rosetta@home (a cancer research screen saver) inside one of our virtual machines! If something doesn’t look right, it’s worth checking it out.
- Watch for virtual machines where the Kernel or HAL is not set to use more that one CPU (SMP) and the vm is allocated multiple processors via Virtual Center. I was approached by a Linux administrator that told me he wasn’t seeing any performance improvements after he added a second processor. After I poked around a little bit I discovered he was running a uniprocessor kernel and hadn’t recompiled his operating system for SMP. If the operating system doesn’t have the ability to recognize more than one processor, you won’t be seeing any performance gains by throwing more vcpu’s at a larger workload.
Monitoring with Virtual Center
Virtual Center is a great place to start at for CPU performance monitoring both at a physical level and a virtual machine level. Before getting into too much detail I wanted to explain Virtual Center statistics logging. There are various levels of logging that can be set for the VC database. Beware! You can easily over run your database and fill up your exiting disk space by setting all of these to the maximum setting. Think of this as a debug level, the higher you set it the more information will be captured to the database for analysis (more disk space consumed). If you need to get to some of the more detailed performance statistics, VC performance counters and their corresponding levels can be found here. To change these settings click, Administration –> vCenter Server Settings –> Statistics.
Let’s take a look at a physical ESX host performance metrics through Virtual Center. vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here is the CPU sample:
Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of physical CPU utilization across all processors:
The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.
Virtual Center Alarms are an excellent tool that can sometimes be overlooked. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup CPU alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host CPU utilization is at 75% for 5 minutes or greater.
Monitoring with ESXTOP
Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you ever need to get back to this screen in the future, just hit the “c” key on your keyboard. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your PCPU and CCPU (Physical/Console) here. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.
Monitor inside the Virtual Machine
A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. Check your % Processor time which is the current load of the virtual processor.
Monitoring with PowerCLI
Another great place to go to for finding potential cpu problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find virtual machines with CPU ready time? How about virtual machines that have CPU reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.
If your using VMware vSphere, there are many different ways to monitor for CPU problems. The Virtual Center database is the first place you should start. Check your physical host CPU contention, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check physical CPU, console cpu then the vmworlds that are running on the ESX host.
Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.
Hopefully you have read my previous blog posts on pvSCSI. It describes what the driver is, how it works, and how it can positively impact your performance and workloads. Part two covers the process of installing the pvSCSI driver on an existing system and a new system. Both can be found here on the site and you might find them useful:
VMware recently published a KB article that answers a question that has been floating around the community for a while. The pvSCSI driver sounds superior to the LSI driver with direct I/O access to the hypervisor so why not use it in all cases? The article states that you should only use the newer driver when driving higher workloads, those that are typically 2000 IOPS or greater. For those that don’t know 2000 IOPS is a pretty big workload. Consider this, a standard fiber channel 10,000 RPM drive averages around 125 IOPS per disk.
I didn’t really understand this and the knowledge base article is lacking any detail on the rational behind the statement. I reached out to VMware performance engineer Scott Drummonds to see if he had anything he could publish to help clarify the KB article. Scott was nice enough to research this and posted his findings here.
So it appears that the technical explanation is interrupt coalescing or buffering. The paravirtual SCSI driver was designed to handle receiving multiple requests at a high rate and then “batching” the requests together for better efficiencies in throughput. If you aren’t generating high enough workloads on the virtual machine, the I/O request could unnecessarily sit in the queue while the “batch" waits to be filled up for the next transaction. This could cause storage performance problems which would typically be seen as higher latency and would negatively impact the virtual machine.
Now and Then
The great news is the current release of the driver is optimized for heavy workloads. If you are starting to virtualize SQL/Oracle systems and need the performance, go for the pvSCSI driver and get better throughput. If your deploying standard virtual machines that are doing lower workloads, continue to embrace the existing LSI Logic driver.
If you are new to vSphere 4, or have just upgraded from 3.5 and are starting to rebuild your templates to embrace virtual hardware version 7, don’t use the pvSCSI driver as part of your standard template. VMware is working on the driver and will be introducing advanced coalescing functionality. When this is built into the driver stack pvSCSI will then be able to be utilized for all workloads as it will understand when it needs to ramp up for higher workloads.
Thanks again to Scott Drummonds for taking the time out of his busy schedule to track this one down.
I am attempting to pull together a blog post around performance, it’s going to be a four part segment on each I/O component of VMware, CPU, Memory, Storage and Networking. My goal is to try and cover the various tools that you can use to help troubleshoot performance problems that you might experience in your virtual environment.
While I was going through some of the methods, I wanted to illustrate how VMware now includes Windows Performance Counters inside a guest virtual machine to assist with performance monitoring/troubleshooting. I jumped on a test virtual machine I have, and pulled up Windows perfmon. To my dismay the VMware counters are missing! We are currently running VMware vSphere 4.0 update 1 so I checked with a few other people online like Rick Vanover (@RickVanover). It confirmed it seemed to be related to this specific release of vSphere.
I reached out to Scott Drummonds via Twitter (@drummonds), a performance systems engineer who works for VMware, and also opened a service request with support. Scott validated that he saw the same issue and was launching an investigation. Unfortunately the SR didn’t get very far as I was instructed that this was an “experimental feature and was removed from vSphere”. Uhhh ok, I knew that wasn’t right so I waited to hear back from Scott.
Scott has since written a blog post that discusses this issue. It looks like a complete uninstall of the VMware tools on the client followed by a re-install resolves the issue. This does require a reboot for those that are not familiar with this process. The problem appears to be related to mofcomp which it a tool that Microsoft provides and registers WMI information (such as VMware performance counters) with Windows.
Thanks to Scott for jumping on this so quickly and posting a fix to the issue, it’s great to see social media paying off in the real world. Thanks to Rick for helping me figure out what was going on and validating some of my assumptions. Rick has also written up an excellent blog post on this same issue. Hopefully a patch will be rolled into the next minor release of vSphere 4 that will resolve this bug going forward.