Download The Newsletter: VMware Newsletter March 2011
At one of our last internal central VMware meetings, a few of us had a similar idea to pull together a newsletter for our customers. Some of us were already doing this to a degree, but collectively we agreed that one source of information would be better than many. Several VMware SE’s and Specialists have helped pull this together so I wanted to thank everyone for their hard work.
There is so much great content that is published to the web and sometimes passed internally, we wanted to consolidate this information to one common distributable location. 99% of the content is not really specific to the U.S. Central region other than the local events, so I think that many people will be able to benefit from out efforts. The goal will be to test the waters and see if it’s something that people like and want to see continued.
As always we are looking for feedback, if you think this is something that should continue, let us know! If you feel it’s lacking or could be improved in some aspect, we are also looking for your opinion to help shape it.
Enjoy!
-Scott
I spend a majority of my time talking with VMware customers trying to help understand their needs and how we can help them with some of their internal IT business challenges. I would say a majority of the problems and issues discussed are typically based around internal politics and the IT landscape changing, but their second largest concern is around performance and growth (capacity). VMware and virtualization in general, has been such a powerful driver for many organizations over the past several years. It has allowed IT organizations to run more efficiently, save capital expenditure costs, and ease administrative overhead, all in the midst of an economic downturn.
Capital expenditure costs savings are great, and very visible to the organization from a high level, but VMware needs to help customers with the next step. Now that we are moving so much of our infrastructure to a more elastic and flexible solution, (vSphere) we need to provide tools to help you manage this infrastructure because the same methodologies no longer apply as they did in the physical world. The more we can help automate and manage your virtual infrastructure; we can now begin to help with step two which is save your IT organization operational costs. A recent Gartner study determined that the average cost for a Windows server is $10,200 per year. Of that expense ~ 70% is on OPEX. Gartner also estimates that with automation and management that up to 80% of the OPEX could be saved.
VMware has made several acquisitions around management and automation, and I wanted to focus on one which was recently announced. VMware Operations is a “new” product that was released this past week. It’s actually not all that new of a product but a re-branding of a key acquisition announced at VMworld 2010. Integrien was an analytics and statistical based software company with a focus on management software. Notice that their primary focus was not management but analytics, a completely different approach to several other software companies out there trying to get to the same end result.
Rather than simply creating metrics to monitor and then setting thresholds on those metrics, Integrien will actually analyze the information that it’s gathering and understand when there is an actual problem. One of the coolest features about the full blown enterprise version is you can feed multiple data sources into the analytics engine. The more data it gets, the more accurate it’s able to predict when a problem is likely to occur.
This isn’t just your standard run of the mill monitoring software.
Those of you that have experience with enterprise monitoring software will know that it take a lot of effort to get these systems up and “fine tuned”. It takes a tremendous effort to begin sifting through all of the white noise alerts that come in and then adjust the threshold alerts to something that is tangible so it becomes useable data. VMware Operations removes that manual effort by dropping in an intelligent analytical engine that can understand what’s really going on behind the scenes.
Here are the different versions of the product, and how each version differs. I would suggest pulling down the virtual appliance and check out how awesome this product is. If you don’t feel like going to the effort, check out this video, it gives a great walk through of vCenter Operations and explains a lot of the same concepts I just wrote about.
If you’re a VMware fan, you have probably already seen the graphic above, or some variation thereof. And you’re also probably already pretty familiar with the blue layer, or the Infrastructure layer of the cloud computing “stack.” In addition, you’re probably well versed in the orange layer, or the End User Computing layer. But what about that green layer?
That green layer is commonly referred to as cloud middleware, or the vFabric Cloud Application Platform. It’s the ooey-gooey middle layer that leaves most of us in IT scratching our heads. It’s where software developers live and breathe, but for rest of us, it’s the layer we have traditionally avoided like Charlie Sheen avoids sanity.
I’ll be talking more about vFabric in future posts, but today I’d like to focus on WaveMaker, because it’s an exciting piece for those of us that aren’t software developers. It may be just the tool that gets us to dip our toes into that ooey-gooey green later.
OK, so what is WaveMaker and how will it fit into that graphic above? First,the official news blurb …
VMware closed its acquisition of WaveMaker on Friday March 4, 2011. WaveMaker is a widely used graphical tool that enables non-expert developers to build web applications quickly. This acquisition furthers VMware’s cloud application platform strategy by empowering additional developers to build and run modern applications that share information with underlying infrastructure to maximize performance, quality of service and infrastructure utilization.
Great, soooooo what does that mean for readers of this blog? WaveMaker is a tool built just for us! It is the tool that will enable us to build web applications very quickly and deploy them to the cloud (that ooey-gooey green layer of the cloud) with a single mouse click. WaveMaker claims it can eliminate98% of code, cut the web development learning curve by 92% and reduce software maintenance by 75%. Here are a couple of other bullet points you’ll find interesting …
- WaveMaker eliminates Java coding for building Web 2.0 applications
- WaveMaker Studio generates standard Java apps
- One-click deployment eliminates the complexity of deploying web apps to enterprise or cloud based hosting.
For more information, be sure to check out Rod Johnson’s blog post VMware acquires WaveMaker. And of course make sure you visit the WaveMaker website. While you’re there, download the software and give it test drive! After you do, be sure to let me know what you think.
–Aaron
I’ve been running some tests in the lab lately, and trying to solve a problem that I don’t think is solvable right now. I’m hoping some of our readers will point out a potential solution that I have missed. Kendrick Coleman posted a write-up of how VM performance can be impacted by VM placement within the cluster. This is almost exactly what I have been testing in my lab, with a few twists.
As Kendrick points out, VM’s that need to communicate with one another regularly are better off on the same ESXi host. With VMXNET 3 NIC’s, one can achieve massive throughput between VM’s on the same host. However, that is not always the case.
The issue I am running into as I design my production environment is a requirement to have everything segmented off into hundreds of VLAN’s. This means that there will be servers on the same host that are on different VLAN’s that will need to communicate, sometimes frequently. This completely negates the benefit of having the VM’s on the same host, as the traffic will have to leave the box to be routed.
Here are some tests I did using iperf from the VM Advanced ISO v0.2 just to further expand on the idea:
2 VM’s on same host / same VLAN
2 VM’s on same host / different VLAN
2VM’s on different hosts / same VLAN
2 VM’s on different hosts / different VLAN
As you can see, it makes almost no difference that VM’s are on the same host, when the VLAN’s / subnets are different. Just for fun, I bumped the TCP window size, and was able to achieve 3.5Gbps from VM to VM on the same host, and the same VLAN. When the VLAN is changed, the ratio of slowdown is the same regardless of host affinity. This is because traffic is leaving the host, going all the way to my Cisco 6509, and coming back into the same host.
Just for reference, all the hosts in these examples are connected to the same Cisco 1GB switch.
I brought this up with Cisco when they were in talking about UCS. They were mentioning their roadmap, and virtual appliances, so I thought it was a good time to ask whether there would be a virtual layer 3 appliance in Cisco’s roadmap. The response was about what I expected.
Even if I go UCS, which brilliantly handles east / west traffic across multiple chassis, the top of rack 61xx Cisco devices don’t route, so I’ll still have to go all the way out to a 5000, or soon a 7000, to get routed back into the same host on the same wire.
Talking with a few friends who know more Cisco than I do, we discussed the idea of a virtual router. The inherent problem with a virtual router in this environment is the VM is still bound by its default gateway. When DRS runs, and moves VM’s around, now a VM is living on a host that does not have the particular virtual router with his gateway interface. That defeats the purpose.
We talked about how it might be possible to work around this using Cisco’s Gateway Load Balancing Protocol (GLBP), but even then, you’d have to set preferred active paths, and it wouldn’t always work the way we need it to work.
The only solution to this issue I can think of is a Distributed Virtual Router, which doesn’t exist. If someone could make a virtual router that operates like the Distributed Virtual Switch, it would help all us people out here in the financial world who are ever more constrained by tons of VLAN’s, and (virtual) firewalls in between those VLAN’s.
Is there a need for this in the marketplace? Or am I making a bigger issue out of this than I should?
As always, your comments are appreciated.
I have run the entire gamut of virtual SAN appliances so far in my VMware lab environment, and I always come back to the Celerra UBER VSA. The best thing about this is that it’s free, and there’s no expiration. The LeftHand is easier to use, and has some neat clustering features, but it’s a 60 day license.
I don’t know about you, but there are times when I get busy, and don’t get a chance to touch the lab for weeks. 60 days is just not enough. I got the NetApp ONTAP 8.0 appliance as well, but it’s a pain to use unless you’re running VMware Server (who still runs that?) or Workstation.
Anyway, I’ve been struggling with the performance of the UBER VSA, and trying hard to find a way to make it faster. Thanks to Clinton Kitson, I was able to dramatically improve throughput and latency on my Celerra VSA. Time to deploy a VM from a template went from 30, down to 9 minutes.
Clinton says they may try to include this in the next UBER version, as long as there is consensus that it is beneficial and does no harm.
Here’s the tweak:
Login to the VSA using the root account. Password is nasadmin.
Type in this command: /sbin/swapoff -a
Then, go in and edit the sysctl.conf file using this command:
nano /etc/sysctl.conf
Add the following lines to the end of the file:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Ctrl+X to exit out. Save the file over the existing sysctl.conf.
Restart your VSA.
The screenshot below shows my VSA. I had already changed the caching settings, but here’s what happens by simply turning off the swap. You can clearly see the huge improvement in both latency and throughput.
Basically what we’re doing with these commands is telling the VSA not to swap. We’re also changing the way the underlying RedHat OS caches data before it writes to disk. Be aware that this does increase the risk of data loss, as we’re caching much more data in RAM before it’s written to disk. If data loss is a concern in your lab, you may want to stick with the standard settings. Also, my VSA has 6GB of RAM allocated. Still only a single data mover. Obviously more RAM = more performance when we’re turning up the caching.
Thanks to Clinton for pointing out these settings. It’s hard to find performance information on the EMC VSA. I hope this post helps you get more work done in your lab environment.
Yesterday I had a marathon five-hour executive briefing with EMC, and I learned a lot about the VNX, and EMC’s strategy going forward. This was good information, and I want to give my take on what I learned, as well as maybe open up some discussion in the comments.
First off, I have to say one thing about EMC. Regardless what you think about their storage technology, and product line-up, their sales is absolutely fantastic! I have yet to experience pre-sales as good as EMC. They are polished, professional, and knowledgeable about their product. They brought in a wide array of talent, including a vSpecialist, and there was not a single question they could not answer. This is impressive, considering some of our questions were a bit. . . creative.
Aside from all this, I am most impressed with their ability to put in an awful lot of work to analyze my current environment, and figure out exactly what my needs are. No one else has even come close in my recent experience. This alone goes a long way toward overcoming what could be perceived as product weaknesses versus their competition.
In my opinion, the VNX launch is an attempt at addressing the lessons learned by EMC over the past several years. I believe we can all agree that EMC was caught off guard by the ability of other vendors to integrate more tightly with VMware than VMware’s majority owner. Their recent moves have shown their willingness to correct that, and a strong desire to leapfrog the competition.
Considering the combination of VNX, recent tighter integration with VMware, and the intense growth of EMC’s battalion of vSpecialists , one can only presume that they are willing to use brute force to become the VMware platform of choice. I find it remarkable that a company with as many divisions and products can move with the degree of agility they have shown recently.
VNX introduces some interesting changes besides the concept of a truly unified platform. Fiber Channel is out, and SAS is in. This makes sense to me, considering the current pricing trends of SSD. It no longer makes as much sense to go with FC as a tier of storage. When SSD was 30x more expensive, not many considered it a viable alternative to FC, despite the huge performance advantage. These days, it’s maybe 3-5x more expensive, and that’s easy to justify with the performance. The price should continue to plummet as more manufacturers come online with more fab plants, so it won’t be long before there is price parity between FC and SSD. This was the right call, in my opinion.
A new version of FAST VP is another significant change introduced with VNX. When you lose an entire category of disks (FC), and only have SAS and SSD, you can eliminate the issues with FAST where you still could have hot spots on slower disks, and cool spots on your faster disks. Now there are only two tiers, so there should not be an issue with data finding itself on the wrong tier.
There is no way I can go through all the features and software changes introduced with VNX, and that is not really my intent. I do want to flesh out one area where I see a potential design flaw. Of course, this is all based on my opinion, and I am not in the same league as EMC, NetApp, HPar, or XioTech storage engiNerds, so take it for what it is worth.
The whole idea of FAST Cache bothers me. EMC is using SSD’s for caching. While there are advantages to FAST Cache over a standard pool of SSD’s, they do not seem to justify this design decision. FAST Cache uses 64k chunks, which is more flexible than normal FAST operation on an SSD pool, which is using 1GB chunks of data. I can see the advantage of tiering things in smaller increments. I cannot see any reason why EMC did not go with a PCI based cache option.
In my opinion, flash is so fast, wrapping it up in a traditional disk interface makes little sense for best performance. I guess at this point, I should disclose that we use a few Texas Memory Systems RamSAN devices here, so i know how fast PCI based flash can be. Maybe this skews my opinion a bit, but EMC engineers who were here yesterday were touting the VNX’s use of the PCI 2.0 bus that was so much faster. I agree. So why not use it for cache?
Maybe I am missing something, and maybe in the real world, it won’t matter. But if EMC ever bothers to perform an SPC benchmark, I suspect we would see a bottleneck caused by this method of crippling SSD’s with SAS interfaces. That said, no one is going to argue that replacing a hot swappable SSD is not 100x easier than shutting an array down to change out a bad PCI flash card. I am just not sure the performance penalty is worth the extra convenience, for me.
If someone who knows a lot more than me can help me understand why this decision was made, feel free to comment below. For now, I will hold out hope that maybe there will be a VNX-p at some point with some faster cache.
In the coming weeks, I will have similar briefings with a few other vendors, as we try and narrow down our choices for a new storage platform to replace our aged HP EVA’s. If I come across anything else interesting, I will certainly pass it along to our readers here.
I am one of the SE SME content contributors to the VCP certification exam testing and blueprint process at VMware, and as we are about to start the analysis portion for preparation of the next version of the exam (and no, I cannot share any timelines or version numbers since they are under NDA!), I thought it might be a good opportunity to get some direct feedback from the VCP certified readers of this blog to take back to my curriculum development colleagues. So this is YOUR opportunity to tell me what you like, dislike, would like to see changed, areas that need more or less content coverage, etc. Please be honest, but reasonable and constructive as well. I will take the feedback from the comments back to our upcoming curriculum development meetings. Let your voice be heard!
UPDATE: Thanks for all of the feedback folks! I have submitted the input to others in the curriculum dev team as we start to formulate the next tests. I have had one informal conversation with one of the test developers, and he agreed that we will be looking to not totally eliminate the min/max, but reduce it to -5% of the overall test question complement…So mission accomplished on your feedback! Let me know if you have other comments or suggestions!TMac
OK good. . . the catchy title worked, and you’ve committed to reading this post. So let’s get to work on making your millions.
Lucky for us technology guys and gals, there are still plenty of C-level employees at companies everywhere who have yet to witness all the benefits of VMware first-hand. Steve Duplessie with ESG just published the following statistics:
• 58% of organizations have virtualized less than 1/3 of their servers.
• Thus far IT owned applications dominate what’s being virtualized. File/Print, etc. 59% haven’t virtualized ANY “mission-critical” applications.
So why are there still this many laggards? These execs have all read countless articles showing how much money they’ll save with virtualization, but often times they haven’t seen the benefits of virtualization beyond simple consolidation. I’m going to show you how to open their eyes in just 15 minutes.
I have done this demo multiple times for audiences from CEO, CIO level, all the way down to customer facing business executives. Each time it has literally been shock and awe, and although I can’t give all the detailed numbers, I will say that the purse strings were blown open and FINALLY our project has a green light. . . millions of them in fact.
This post assumes you have at least rudimentary presentation skills, and understand how to communicate well with people at the executive level.
Fitting vSphere’s “Greatest Hits” into only 15 minutes is not an easy task. It took weeks of careful planning, and test runs before I was able to get everything timed just right. Each time I have presented this, it has gone beyond 15 minutes, but only because there are lots of great questions coming from the audience. I expect you will see the same results if your audience is somewhat intelligent.
After going through a list of features I thought might be interesting to executives, I pared it down to just three in order to make my time limit of 15 minutes. If you have more time, that’s even better, but we all know these guys are very busy.
And that brings me to my first tip: Try and schedule your demo during the winter months so that tee times won’t conflict with your meeting.
Here’s how my 15 minute demo goes down.
Most of your audience will probably already have some idea of what virtualization is at this point, but it’s still a good idea to make sure you cover a few quick points on the basics. I do this while showing them the vCenter console. I show them the virtual machine view, and tell them about typical x86 workload resource usage, and how virtualization allows us to maximize our hardware investments, etc.
Important: Make sure you clarify for them what a “host” is, and whatever term you use for a guest, whether it’s “server”, “VM, or “guest”. You need to emphasize this point up front, and multiple times during your demo so people stay on track and understand what exactly they are seeing.
The first feature I show is a simple cloning operation. By now in your career, this may be old hat, but believe me your executive audience will be impressed by this.
Always make sure you give the benefits when showing any feature. I start my cloning operation while telling them what I’m doing, and then while it runs you’ll have time to explain the benefits. Try and tailor the benefits around how they will impact the customer. Whether the customer is an internal one, or an external one, your audience will appreciate this point of view.
Tip: Use something like BgInfo on your VM’s to show the server name so that your audience can follow more easily. If you don’t use BgInfo, at least change the wallpaper to show the name.
With server cloning, I touch on how valuable it is to be able to clone a server that is having issues so that Development or QC will be able to reliably duplicate a bug or issue. Explain how tough this is in a traditional environment where you have to try and duplicate an issue on a server that is not 100% identical. Also explain how a server can be cloned to test patching or an application update without impacting the production environment, or the customer.
This section should run 3-4 minutes, depending on your SAN speed. If your SAN is slower, do it on local storage to get it done faster. When it’s done, make sure you change the VLAN so you don’t get a conflict, and boot it up. You can quickly just login to show them the server is identical to the one you cloned. You don’t want to spend more than 5 minutes on this one if your allotted time is only 15 minutes.
This is a perfect time to share the next tip: TEST ALL THESE STEPS several times. Time yourself each time, and even do a few dry runs with your team if possible. This will ensure your demo comes off flawless, and that’s important. Believe me when I say some of these people are looking for a reason NOT to virtualize. Most of the time, it has less to do with virtualization, and more to do with fear of change.
My next demonstration is vMotion / Maintenance Mode. This will be mind-blowing for your audience, especially if any of them have a technical background. I start off by telling a story about how some fans have stopped working on one of our hosts, and we need to get the new fans installed before it overheats. (Make sure you know which host has your intended vMotion candidate ahead of time) We can’t wait for the maintenance window.
Normally in this situation, a second cluster node, or hot spare would have to be brought on line, which would mean a short outage for the customer. In this demo, we don’t have time to enter Maintenance Mode, so we’re going to vMotion a single server. I explain that this is how Maintenance Mode works, and how this will be transparent to our customers, and then I prove it.
Bring up a console session on the server to be vMotioned. I use an IIS server as an example, as it’s customer facing, and they understand that well. In my console session, I start a ping -t to another server. In this case, it’s an application server, which the IIS server needs to maintain contact with, or customers will be impacted. Then I execute my vMotion. You might need to explain what “ping” is, so that everyone is on board.
After the server vMotions, I show them that we didn’t drop any packets, and that the customer has not been impacted, and then show them that the VM is on another host. I always reduce my DRS automation level before the demo. I don’t want them to see other migrations happening while we’re demoing. That would spawn a discussion we don’t want to have right now. This takes us to the 10 minute point, barring any questions.
Inevitably at this point, someone usually asks what would happen if the server just failed with no warning. This plays right into your hands, as it’s the perfect segue into your next feature. Fault Tolerance. If they don’t ask, then you ask.
For FT, I setup an FTP server using FileZilla. You can setup whatever works best for your business, but make sure it is something that can clearly demonstrate that customers will not be impacted by an outage. I have preselected a “customer data file”, and setup a simple “FTP Client” VM with FileZilla Client.
I did have to adjust my incoming FTP speed for the server so that the file wouldn’t complete too quickly. You’ll want to make sure you have enough time to test a failover operation, and show the file transfer still going from both the client, and server perspective. So either select a huge file, or bump down your bit rate for the client in the FTP software.
Open up a console session on the FTP server, and then point out the “secondary” instance. Open a console session to the secondary. With both sessions side by side, poke around in the primary and open some windows, a browser, or whatever. You’ll want to demonstrate that the servers are in lockstep with one another. Open a console session to the FTP Client.
At this point, I explain how the customer is sending us this file, and start the transfer. I then explain that the particular host that the FTP server lives on is going to go down without warning. Show them the host name. You can simulate this with the Test Failover option.
It will take less than a minute, during which you will want to point out the bytes transferred on the server, and the client. Point out that the client has no errors, and then you’ll see the secondary come back online.
Again, show the host name so they can see that it has indeed changed servers. I found it helpful to time the file transfer so that it would complete right around this time. Then you can show them the server, the completed file, and the client, once again explaining how the customer has no idea that a server went down in our datacenter.
If you don’t get gasps or applause at this point, you did it wrong. Once again, PRACTICE this over, and over before taking it to the executives! You don’t want to be up there looking like Bill Gates demoing Win98. You want to look like John Chambers demoing the Cius.
Wrap up with an explanation that these are just three of hundreds of VMware features, and then answer the questions that follow. At the end, these people will be frantically searching for their checkbooks. Your millions should start flowing after the next budget committee meeting.
There are a lot of the customers I cover in my region that are really starting to see the value in VMware’s management tools. As virtual machines now outnumber physical machines, customers need some tools to help report against their existing infrastructure as well as predict and prepare for future virtual machine workloads. One of my favorite VMware tools that I liked when I was on the customer side was a product called Capacity IQ. I wrote up a blog post that I think people found useful that was basically an overview of the benefits of the product. You can check that post out here. I tell most of my customers about it, because it’s simple to setup (virtual appliance) and it gives you loads of great information about your existing infrastructure.
One of my customers that is moving forward with a CapIQ implementation e-mailed me about what types of storage metrics are available from the product. I was happy to inform him that Capacity IQ 1.5 was just released and provides some great storage statistics that can now be reported against. Much to my dismay, he told me that he wasn’t seeing the storage report data, the metrics were all blank.
Here are the requirements to get the reports to produce storage related information:
You need the vCenter management webservices running for CapIQ to collect some of the storage metrics. The storage IO metrics require you to have ESX 4.1. When you use ESX 4.0 or earlier hosts, the following metrics appear with dashes (–) and affect the Dashboard, the Datastores – List view, the Virtual Machine Capacity – Summary view, and the Virtual Machine Capacity Usage – Trend view:
* Disk I/O read/write
* Disk I/O reads/writes per second
* Disk I/O read/write latency
* VM Disk I/O read/write latency
-Scott
Scripting is a bit of an art. Recently I’ve done some scripting and used the vmware PowerCLI. I am not a Powershell scripter. I’ve done lots of bash, Perl, and vbs scripting, but not much Powershell. Fortunately, after messing around with Syntax, it’s not too hard to do some basic scripting. I’m nowhere near a powershell or PowerCLI expert, and fortunately in my role, all I need to do is give my customers “ideas” so that they can continue on with those “ideas”. With any sort of scripting or development, it is always good to review others techniques which are readily available on the vmware community site for the PowerCLI. General Powershell examples are also very readily available just by googling (is that a word?) for the specific example that you want to review. Obviously, any hit that you get may or may not be good examples of technique, but with a bit of discernment and looking at multiple examples you can begin to guess which are the better ways to attack a problem and write very readable and editable scripting code.
To write a basic PowerCLI script isn’t necessarily that difficult, as I mentioned. I find that the difficult part, is to write a resilient script that can handle any situation or state that might be thrown at it. I like to offer examples to my customers that can do good error checking and handling or be able to react to different scenerios. The second example script that I offer below, makes an attempt to do just that.
The first script, however, is a script that references the vSphere API, directly. Even though the PowerCLI has many cmdlets to handle many vSphere functions, it doesn’t expose the entire vSphere API as cmdlets. Thus, sometimes it is required to have Powershell work on the API objects, themselves. This first script, is thus a bit more cryptic than the next two, because there wasn’t a cmdlet available for the function I was trying to perform. Since it is not as common to operate on the vSphere API, there aren’t nearly as many examples and references. Lots of trial and error is required if you need to script at this level. The cmdlet “get-view” allows the Powershell programmer the ability to operated directly on the vSphere API object. Hopefully, its clear from this script how the “get-view” cmdlet can be very powerful to the Powershell programmer.
So what does it actually do? The first example script will churn through the vms in inventory, and modify the virtual machine’s settings (checkbox under the options tab of the vmware settings) so that the vms will check for their vmware tools upgrade during a power cycle. This checkbox is not enabled by default, and is very time consuming and potentially error prone to try and make the change one vm at a time for an environment with lots and lots of vms. This same script could be modified for any mass vm setting changes that a customer would require. This script actually doesn’t do any error checking, so I am breaking my own rules of a resilient script, but again, all I needed to provide was an “idea”.
The second and third scripts are related. They shutown and startup vms based on a specific folder. This could be modified to work off of a resource pool, or a naming convention depending on your setup. This particular customer wanted to call a script after and HA event to shutdown or suspend non critical vms so that they would not be capacity constrained after an HA event. The second script, in particular, tries to react to different vm powered on states or whether vmware tools are even installed. This is an example of trying to write a resilient script. Enjoy!
One has to have the PowerClI toolkit installed to run these scripts.
PowerCLI main site (including the download link for the PowerCLI toolkit)
http://www.vmware.com/support/developer/PowerCLI/index.html
The link to the vSphere SDK is below (for those functions that are not available via cmdlets):
http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/index.html
PowerCli Command Reference
http://www.vmware.com/support/developer/PowerCLI/PowerCLI41/html/index.html
Script #1
Connect-VIServer <Ip or HostName> -user <user name> -password <pw>
$vms = get-vm
ForEach($vm in $vms)
{
$vm_view = $vm | get-view
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec
$vmConfigSpec.Tools = New-Object VMware.Vim.ToolsConfigInfo
$vmConfigSpec.Tools.ToolsUpgradePolicy = “UpgradeAtPowerCycle”
$vm_view.ReconfigVM_Task($vmConfigSpec)
}
Script #2 (my version of shutdown within a specific folder)
Connect-VIServer <Server IP or NAME> -user <userid> -password <password>
$vms = get-vm -Location Test-Folder
ForEach($vm in $vms)
# try a graceful shutdown if vmware tools are installed..could use suspend-VMGuest as an alternative,
# otherwise just stop the vm
{
$vm_view = $vm | get-view
$vmtoolsstatus = $vm_view.summary.guest.toolsRunningStatus
Write-Host “VM $vm says tools status is $Vmtoolsstatus”
if ($vmtoolsstatus -eq “guestToolsRunning”)
{
Shutdown-VMGuest -VM $vm -Confirm:$false
}
else
{
stop-vm -RunAsync -VM $vm -Confirm:$false
}
}
Script #3 (my version of a startup script based on a specific folder)
Connect-VIServer <Server IP or NAME> -user <userid> -password <password>
$vms = get-vm -Location Test-Folder
# Start each vm in the folder
ForEach($vm in $vms)
{
start-vm -RunAsync -VM $vm -Confirm:$false
}
The characters, companies, and products mentioned in this video are fictional. Any resemblance to actual ones is purely coincidental.
Happy Thanksgiving!
Introduction
VMware recently announced the general availability of a Zimbra virtual appliance that VMware customers can simply import into their existing infrastructure and get “e-mail in box”. This is a great concept for administrators because the operating system is pre-configured and purposefully built for the application that is packaged with it. The virtual appliance will import into the virtual center management console and will have the standard "OVF” (Open Virtualization Format) file extension for those that are new to virtual appliances.
I am no e-mail administrator, so I wanted to see how easy setting up the Zimbra virtual appliance would be and provide some instructions for those out there that are looking to test out Zimbra.
Get the Bits!
First things first, go out and grab the download of the Zimbra virtual appliance by clicking on the icon below. Yes you will need to register to download the bits…
Import the Virtual Appliance
There are two methods of importing a virtual appliance, you can enter the url, which is supplied by the Zimbra website once you register, or you can download the appliance locally and import it locally. I grabbed the full download in case I hosed something up I would have a copy of the ovf locally so I could start over from scratch. I guess a snapshot would work as well, so it’s up to you how you would like to proceed here. Below is a screenshot of the import:
Configure the Zimbra Virtual Appliance
The Zimbra virtual appliance is pre-configured to ask you the basic configuration parameters you will need to get the appliance up and running. You can see below are the questions that you will to answer, pretty common stuff if you a IT administrator. Make sure you use the FQDN for the hostname.
Power it up
Now that you have configured your basic system information you can now power up your new virtual appliance. You can see below that it will automatically configure the appliance based off your information you have previously populated. Very nice for a hands off approach and a streamlined installation process.
Finish it off
Now that you are powered up and on the network, you can login to the administration console to finish your configuration. Point your web browser to https://<hostname>:5480 The administration console will be the place where you can create user accounts, configure licensing information, pull diagnostic data for troubleshooting and update the virtual appliance itself.
DNS is a big component of e-mail. If you are doing split DNS or Dynamic DNS, I suggest to reference this link to assist your efforts. I am using a dynamic DNS service at home along with split DNS, so I had to go and update my host entry with a MX record so the world new where to route my e-mail traffic. Once that was done correctly I was up and running and able to send/receive e-mails with no problem.
Licensing
The last thing you will want to to is license your installation, the nice folks over at Zimbra will give you a 10 user license free of charge. Click the link below to go license your configuration or view some sample pricing on what a fully licensed configuration would look like. Enjoy!
-Scott
While cruising the show floor at VMworld San Francisco, the Xiotech booth seemed to be abuzz every single time I walked by. Finally, I stopped in to see what all the commotion was about. It was not about Xiotech at all. It was about iPads. As soon as I stepped close to the booth, there was a rush to scan my badge so I could win an iPad, but not a real clear picture of what the heck Xiotech actually did. I moved on to the next booth, and I suspect most of the 17,000 attendees did as well.
The first time I heard about Xiotech was a year or so ago, and at the time, the only thing I understood was that they offered storage in a sealed box that was supposed to be more reliable than your average monolithic array. At the time, it sounded far-fetched to me. I didn’t see how they could make those claims while using the same disks we all use in some special locked box.
It wasn’t until I heard Xiotech’s CEO Alan Atkinson on an Infosmack podcast that I decided to investigate further. Alan didn’t really go into details, but after hearing him talk about some of the heavy hitters on his team, I was intrigued. Their own website doesn’t really spell out in detail what they are doing that makes them stand out. I had to make a call and talk to their engineers. What I found was so amazing, I have to share it here.
Since Seagate is one of Xiotech’s major shareholders, these guys have unfettered access to the inner workings of the disk drive, from the firmware on down. Inside a Xiotech “magic box”, the disks are the exact same model as one would find spinning inside an array from another manufacturer. The firmware is where the magic actually happens.
Apparently disk drives have all kinds of cool things they can do besides reporting your typical “OMG I’m going to fail soon!” messages. Seagate reports that nearly 80% of the “failed” disks they receive are actually fine. They cannot find any issues whatsoever. What does Seagate do with those disks? They remanufacture them and toss them back into the “refurbished” bin.
Although the disk has the capability to “remanufacture” itself in the firmware, it cannot be done inside your traditional array. The vibration is simply too high to do this reliably. A typical shelf full of rotating disks vibrates at over 40 rads (units of rotational vibration). When a dozen or more disks are all rotating at the same speed, in the same direction, it’s not hard to imagine the vibration in a tray of disks. Xiotech actually mounts its disks so that they are counter-rotating. One disk rotates clockwise, and the disk beside it is mounted so that the rotation is in the other direction. This reduces vibration inside the Xiotech box to 2 rads. This means they can reliably remanufacture a disk inside the box while the array is in operation.
Cool huh?
Here’s a breakdown of what a Xiotech array does when it detects a potential disk error:
- Data is migrated from the suspect disk to another disk elsewhere in the array
- Disk is power cycled
- Complete factory remanufacturing process
- Recalibrate heads
- Rewrite servo tracks
- Perform a low-level format
It gets better. The unit of storage inside the Xiotech is not actually the disk. It’s the head. This means that each individual platter is a unit of storage, and the data is striped as such. Why is that a big deal? For one, as disk drives get larger and larger, they take longer and longer to rebuild. This increases your exposure to another failure. Second, if the most common catastrophic disk failure is a head crash, wouldn’t it be nice just to be able to disable that particular head and move on? When all the above steps are complete, if a head is still not responding, it will be disabled, and that platter will not be usable. The rest of the disk is good to go.
The net result of all this firmware magic is that the Xiotech array comes with a 5 year warranty at no cost. Why not include it when you’re seeing 99.9983% uptime in the field? To prove the point, Xiotech took 200 disks that were marked “failed” and returned to Seagate, and stuffed them into a rack of Xiotech arrays. They ran for TWO YEARS without a service event.
So we’ve established reliability. What about performance? Disk drives aren’t really getting any faster. How can we squeeze more performance out of spinning disks? Xiotech has answered that with yet another firmware tweak. Often times, the disk subsystem has to wait for data to be read because the head is busy reading another part of the platter. Even under optimal conditions, disks have a few ms of latency built in for the heads to move.
One of the storage geniuses over there actually came up with a plan to have the heads constantly move back and forth across the platter. When I heard this, I wanted to fly out to Minnesota and buy this guy a beer. If the heads never actually park, then statistically speaking, the head will always be closer to the data it needs. What does this mean for you and me? It means that a single 3U Xiotech ISE performs at 12,600 IOPS on the SPC-1 benchmark.
Since the cache, and controllers are on board each ISE, these IOPS scale linearly, as opposed to a monolithic array which could run out of gas if its controllers get saturated. So with 5 ISE’s at 15U, one can expect to hit 63,000 IOPS. This is clearly not your father’s storage array.
Xiotech has their own storage virtualization appliance with the ISE 9000 for larger enterprises, which works quite well with VMware, and is ICON capable for ease of integration with just about anything. This would also be absolutely amazing behind some virtualization from FalconStor, NetApp, Nexenta, or really any of the other storage virtualization products out there.
With all these patents, and truly ground breaking technology, one is left wondering why Xiotech has yet to secure a huge OEM deal. Chris Mellor did a nice write-up on this topic, and his argument is that manufacturers are worried about sourcing disks from only one manufacturer. If Seagate were to have a bad batch of disks go out, it could be crippling.
I can understand his point of view, but I was born a skeptic. Considering that storage array manufacturers generate huge chunks of highly profitable revenue from servicing arrays, I doubt we’ll be seeing a commercial with an EMC / NetApp, or HP guy out fishing with the Maytag repair man anytime soon. Regardless whether they land a huge OEM or not, Xiotech has proven that spinning disks are far from the end of their useful life.
If you’re a follower of mine on Twitter (@eczerwin), you know that about 8 months ago I was transferred by my company from Chicago, USA to Zurich, Switzerland. This mission has been exciting and a lot of fun – but it’s not easy.
The mission – Move roughly 12TB of SAP data over the ocean into two new datacenters running on new iron, with one weekend of downtime. In the future keep your eyes open for another blog post on exactly how we completed this data move.
My personal goal – Virtualize every possible server that comes over (or at least what the app owners will allow without having heart attacks).
After having a chat with the SAP app owners and trying to soothe their concerns we moved forward like this; all SAP application servers and test DBs will be virtualized, while production DBs stay physical (it’s only 2 servers). Oh well, a bit of a compromise but we are bringing the physical server count down to more than half of the current numbers. For all you hardware geeks out there let me give you a quick rundown of what we are putting this environment on …
The Iron
- 2 EMC VMax- 1 for full production the other for test/dev and SRDF/A Target roughly 70TB usable
- 4 DELL R910s for Datacenter 1 – 128 total cores and 1.2 TBs total memory – Production Cluster
- 4 DELL R910s for Datacenter 2 – 128 total cores and 1.2 TBs total memory – Test/Dev/Failover cluster
The Hypervisor and utilities
- ESXi 4.1
- vCenter 4.1 running on a Windows 2008 VM
- Powerpath/VE (Just listed as supporting ESXi 4.1 about a week ago)
In addition to all this new gear, I still have not mentioned the infrastructure for the backups which will include a couple more SANs and more replication. I don’t want to be too longwinded about that here. After all, we are here to talk about virtualizing SAP.
While doing all the design for slot size and etc, I was researching and found there were some special requirements to virtualize SAP in a supported fashion. Our SAP environment is running on Windows 2003 x64, so most of these requirements will relate directly to running SAP on Windows. (Also big thanks to @BasRaayman for good insight into a lot of this.)
First and foremost, SAP absolutely requires memory reservations. Most of these servers have 48GB of memory so we needed a very large cluster design to have the proper amount of slot sizes (in this scenario with such strong hardware, I chose not to use resource pools). The way SAP works is the app fully allocates the memory and doesn’t free it up as long as it is running. If these servers don’t have the memory guaranteed to them 100% of the time they will most likely perform very poorly.
Secondly, there is some cloudy information out there about CPU reservations in SAP. I spent a while reading through many SAPNotes (which led to some confusion) and chatting with colleagues about it. In the end I decided to trust in what I had designed, and let the Hypervisor deal with the CPU scheduling. The important part in this is that you plan for capacity and design your cluster(s) very carefully. In my opinion, CPU reservations are not necessary.
Now, next up is something you will need to prod your SAP Basis Administrators about. There is a memory model setting inside of the newer versions of SAP that is important to set when running in a virtual environment. If you are running Windows 2003, it should always be set to classical. However, if you are running windows 2008 there are differences to take into account. If you have a CPU bound system with plenty of memory resources, go with the Flat Memory model as recommended by SAP. If you are memory bound, go with Classical. This can be tricky and require some monitoring and tuning after it is already live.
For the particularly high I/O VMs, I decided to use RDMs for the data volumes (these are mostly housing large SQL DBs). Now I know this is not required as VMware stated the speed difference is next to nothing. However, if I see later the performance is fine I can always convert them to VMDKs. Also by using the PowerPath/VE plugin we have true load balancing and multipathing back to the VMax Arrays. When I get around to testing the performance with and without PowerPath/VE I may write another blog post about just that!
In my opinion SAP is like any other tier 1 application – just with a few additional special requirements. It can absolutely be virtualized, don’t listen to all the naysayers. My biggest word of advice is to plan carefully and keep your capacity in check, then any other extra bumps in the road should not be too bad.
– Ed Czerwin
I haven’t seen many reviews of this class, so I thought I would share my experience for those thinking of registering for the VMware vSphere Troubleshooting course.
I was originally registered for Fast Track, but after I took a long, hard look at the course outline, I realized I already knew most of the material they would cover. I’m glad I changed my registration to Troubleshooting, especially since the requirements changed to include it as a prerequisite for VCP4 certification.
When I sign up for a technical class, I always worry that I’ll get a professional instructor with not much real world experience. My trainer for this course was John Davis from New Age Technologies. He was definitely well versed in the material, and since he had also spent considerable time in the field, he was able to share a lot of his real troubleshooting experiences. This was the most beneficial part of the class, so if you register, make sure you’re taking it with someone who has been in the trenches.
My intent is not to disparage the excellent VMware official curriculum, as it was very well put together, and quite thorough. I just learn much better when I can discuss with folks who have been where I am, and seen things I haven’t seen. I can read the book and do the labs anytime.
Day One
The first day, we jumped right into some CLI troubleshooting. Since I had been using ESXi, and the vMA, I was on somewhat familiar turf. In the first few labs, we configured the vMA, vi-fastpass, session files, and started right in with some vicfg commands. These are mostly network related commands in the first few labs.
We played around with tech support mode a bit, although this was a couple days before the 4.1 release where it became officially supported. In class we got the official “don’t try this at home” line from the instructor. We enabled SSH on ESXi and played around with Putty a bit. One of the coolest things I saw on day one was the vsish command on ESXi. For those who haven’t played with it, I recommend checking it out. It’s a bit like the Windows registry for ESXi. One can view tons of info about the system setup using this command.
We had an in-depth look at ESX, ESXi, and vCenter log files in the console, and in vSphere Client. We covered viewing, exporting, bundling, and even setting up log collection with the vMA. There were 5 labs on logging, and setting up the vMA to host the logs.
Day Two
On day two, we jumped into network troubleshooting. We covered a lot of new info on the inner workings of the dvSwitch, including synchronization, and what happens when it breaks, which I found helpful. There were a few labs where the instructor broke our network setup, and we had to fix the problem. Some of these breaks surrounded dvSwitch timeouts, obsolete dvSwitch info, and uplink issues.
We played with CDP, port binding methods, and VLAN / PVLAN troubleshooting. I think the PVLAN stuff was most beneficial. Even understanding how those work, it’s great to get some hands-on troubleshooting with them, not having used them in my own environments.
Day Three
Today we continued with networking, which is probably the most information packed module of the course, and with good reason. There were labs on Wireshark, tcpdump, net-dvs, and viewing / changing network configs from the vMA and CLI. As a CCNA, I had a pretty solid networking background going in, but I did notice some in the class struggled a bit with this. If you don’t have much networking experience, or at least a solid understanding of VLAN’s, I recommend brushing up before this course.
We went into management troubleshooting after wrapping up the network break / fix labs. Firewall configuration errors were something I really didn’t care about, but not everyone has the luxury of using ESXi exclusively. There was some database connectivity and lots of vCenter troubleshooting.
Storage was up next, and there was a ton of information on PSA, NMP, MPP, SATP, CHAP, and PSP. After we got all the acronyms out of the way, we did some great iSCSI break / fix labs. These were fun, and our sneaky instructor gave us some pretty hairy issues that may never happen in the real world, but provided fantastic troubleshooting opportunities.
Day Four
VMotion, DRS, HA, and cluster troubleshooting was the topic of the morning on day four. Reservations and swap file space issues cropped up, as did VMotion failures and admission control policies.
My favorite part of the class was absolutely the break / fix labs. On day four, we got the chance to have as many of these as time would allow. Previously all the break / fix scenarios were related to the chapter we were on, so you’d have a good idea where to start troubleshooting. But on the last day, it could have been anything. I had a blast troubleshooting the breaks that our instructor did for us, and I probably learned the most during this exercise.
In summary, I found the troubleshooting course much more valuable than the ICM or Fast Track would have been for me. For someone with zero to only a few months of vSphere experience, ICM or Fast Track may be a better choice. If you’ve been playing with it at least in a lab for 6 months or more, read the books, and VMware Documentation Roadmap, you’d probably find Troubleshooting a more beneficial course.
Will it prepare you for the VCP4 exam? No. But, in my opinion, neither will the others. The only thing that can adequately prepare you for the exam is the Blueprint. I know that’s the standard response everyone gives, but it really is true. If you go through the Blueprint step by step, you’ll be good to go for the exam.














