Archive for March, 2012
This week was the 5.0.1 update of vShield App. I did run into some quirkyness installing, so here’s my quick rundown on how to upgrade vShield App to 5.0.1.
Download the update from VMware’s site.
Launch the vShield Manager home page and login.
Click on Settings and Reports, then Updates.
Click Upload Upgrade Bundle
Select the file you downloaded and click Upload File
Click Update Status and then Install and Confirm the install
VMware says you should see progress here, and mine stayed at 0% the whole time. You will see in vCenter it starts reconfiguring VM’s. On my upgrade, this step took 15 minutes or so, and the vShield Manager was rebooted. I just refreshed the browser and logged back in after I figured that out. If you are prompted to reboot, go ahead and do that now, followed by Finish Install. This is likely browser dependent, as I didn’t see either prompt.
Once you log back in, you should see the following on Update Status
When you see that, you are clear to update your hosts.
Updating hosts is easy, but time consuming.
Click on a host in the vSphere Client, and click the vShield tab
You will see an update is available. Click Update.
There is a warning, but it bears repeating. DO NOT do this on a host where vCenter or the vShield Manager reside.
Your host is going to go into maintenance mode, and from this point forward it’s hands-off.
In the vSphere Client you can see it going in and reconfiguring, then redeploying your Service Virtual Machine on that host.
Then you’re all done. Lather, rinse, and repeat.
Why upgrade to vShield App 5.0.1 ? Check this article on Duncan’s site for my favorite feature.
One last note. When you’re configuring your exclusions in vShield App, the vShield Manager is automatically excluded, as are the service VM’s.
With the right software, even a technology as old as the disk drive can overcome some of its own limitations. We can see many examples of this in the storage world these days. XIO is a great example of a company taking the same disk drives we’ve been struggling with for decades, and making them faster and more reliable.
Another up and coming company that believes in this approach is Pure Storage. I had the opportunity to visit their headquarters in Mountain View with the Virtualization Field Day crew, and got to see some of this software magic for myself. Chris Wahl, who had seen these guys before, made a comment that they are “SSD whisperers”. After my visit, I cannot disagree.
There’s no shortage of promises these days coming from the dozens of new startups centered around solid state disk technology. Before the Pure Storage visit, Mike Laverick was remarking how all these guys always say “we’re the only ones who actually GET solid state”. We all had a laugh, and wondered if Pure Storage would use that line. What we found was quite refreshing. Pure Storage didn’t feed us a lot of marketing or silly quotes. Instead, they actually made ME say “they’re the only ones who GET solid state”.
Pure Storage says they can sell you a solid state array for less money than a refrigerator-sized box of spinning rust. We’re not talking about less $ per IOP. We’re talking about less $ period. Like under $5 / GB. So 10x faster for less than the big spinning arrays. Let that sink in for a second.
With today’s advanced, auto-tiering arrays from the big boys in the storage business, it’s natural to wonder why all this is necessary. Why would we need an all flash array when we can pack a little bit of screaming fast SSD’s into a tray and let the array do the work to make sure your data is in the proper tier? With that methodology, can’t we can come in even cheaper per GB by adding massive SATA disks for the cold data? The answer to that question depends solely on your tolerance for latency.
In the chart above, we can see that even on some very high performing traditional arrays, even with the best tiering algorithms, we’re still going to see IO’s with very high latency. When you use the Pure Storage array, you don’t run that risk. In the demos that Pure Storage did for us, VMware had a hard time even measuring the disk performance, since it doesn’t offer anything less than 1 ms increments. As former VMware heavyweight Ravi Venkat pointed out, you can forget about SIOC, since you cannot set thresholds below 5 ms. If you see 5 ms from this array, it’s probably on fire.
As an EMC VMAX user, I can tell you that one of the underlying concerns in my mind every day is how FAST is working. I have very little visibility into what is actually being tiered, and when, and why. I have to just trust that EMC engineers are smarter than me, and that their tiering is going to prevent performance problems. While this may be easy for some, it’s very hard for me to just set it and forget it in my environment. There’s too much cost associated with latency for me to ignore the possibility that FAST is going to do the right thing at the right time. Plus, it’s reactionary. So even if it does do the right thing at the right time, the “right time” is still after the optimal time to have that data tiered higher.
This is one of the best things about the Pure Storage array, in my opinion. I don’t have to worry about whether the “magic” is working under the covers or not. All my data is on the fast stuff all the time, so I can relax. . . a little. ;-)
There are lots of features that make all this work reliably, and at much faster speeds than normal MLC. For an extremely detailed breakdown by Pure Storage’s co-founder and CTO John Colgrove, go here and watch the video.
For brevity’s sake, I’ll highlight a few features:
- - Inline dedupe using 512 byte segments (better ratios overall)
- - Compression
- - Thin Provisioning
- - Raid 3D (varies RAID levels based on current system activity – see video link)
- - High availability (no config stored on the controllers)
- - VAAI support
- - I/O optimization
That last one is the one I found most fascinating, and you can see John explain it more in the video. Every inbound write goes through a scheduling process that takes into account the current disk activity at a very granular level. Since writes are quite expensive on flash (in latency terms) versus reads, writes must be minimized, and highly distributed. This is where the scheduler comes in and looks at availability, workload, reliability, and lots of other characteristics of each piece of SSD. Then it makes a determination where to write that data to give the best latency. Also, if the system is loaded down, it can even pick a different RAID level dynamically to save on writes, thereby increasing performance.
This is where the magic is, in my opinion. It takes a lot of experience and know how to take MLC and make it as fast and as reliable as SLC, and based on their zero failure rate to date, I think they’ve done it. Of the 35 deployments they’ve done to date, 35% of those are for VMware environments. The industry mix is pretty interesting too, as you can see in the graphs below. This is not some niche product targeted at specific high performance applications.
I have heard some people question whether there is room for all these new storage startups. Since Pure Storage is a startup, I wanted to address the question, with regard to Pure Storage only.
First off, Pure Storage is an amazingly well funded company (from an outsider’s perspective). They’ve got $55M reasons why there is room for their storage startup. That $55M came from people who are a lot smarter than me, and can better answer the question of whether there is room. Check out the investor list.
Plus, they have attracted lots of top talent. I included a short list below, which doesn’t even include Ravi Venkat. Once again, I think the question of whether Pure Storage is a valid startup, or there’s a market for them is just silly. EMC, who just recently announced their roadmap’s inclusion of MLC, used to be a startup. Maybe they can get these guys to show them how to implement it.
I have recently completed one of the most difficult (yet rewarding) portions of work that I have ever been challenged with at my 6 year tenure at VMware. That is, serving as a Lab Captain for the VMWorld Hands-On-Labs at both the US & European conferences as well as the recent Partner Exchange (PEX) event in Las Vegas. As many of you have read in Aaron’s recent post “The Layer between the Layers”, like him, I was also asked to specifically captain and write a section of a vFabric Lab for the HOL at both VMWorld events in Las Vegas and Copenhagen and at PEX. There are 27 Lab Captains for the US and an equal number for the EMEA show, plus a larger number of Proctors for both. As a “generalist” SE (i.e. NOT a specialist in vFabric or even an SME – Subject Matter Expert), I was appropriately intimidated to Captain a topic I was not an SME in, so I was looking for any vFabric Specialist help I could get! Fortunately, I was paired with a great colleague, Chris Harris, who was a vFabric Consultant in the UK.
Since this was so much a part of my life for the past 10 months, I wanted to give you all a taste of what this preparation process entailed. If for no other reason that to help me to decompress from the massive amount of creative work that we went through to prep for VMWorld 2011, but also to give the reader a flavor for the process of what it takes to stand up the HOL from a content perspective.
So let’s start with the content definition and pre-work. We had a plan to construct content around a “real-world’ customer implementation of VMware technology, rather than product centric demo names and examples. I personally thought this to be a double sword of opportunity. We could communicate to customers the “scenario” of how and why VMware can supply a solution to a specific problem, but I thought many attendees might be confused with “NO PRODUCT NAMES” in the scenarios. I agree that we need to avoid product sell in a technical lab environment, but we also need to inform our attendees in a bit more detail on the products that they will be concentrating on in the individual labs. (BTW, we are changing this next year…)
That aside, the labs this year continued to make great advances in not only the technical demo aspects, but also business application illustration examples as well. I am always amazed at the ability of the Core Team to adapt to the constantly changing, massively dynamic virtual workload demands of the lab (while using alpha and beta “dogfood” builds to equip the lab) in a “live-fire” environment. After working in the labs over the past several years, I think this is THE example environment that represents the “most extreme” examples of virtualization “stretch” in our customer base. By that I mean that the problems we face using cutting edge technologies, the latest beta (and sometimes alpha) code, and the massive workloads being generated and managed, are extraordinarily challenging (and really fun!…mostly…) J. Never let it be said that PCOIP does not work over the WAN…we ran an entire portion of the lab from Las Vegas in Copenhagen, and everyone thought it was local! So overall, we are often breaking new ground and demonstrating what can be “virtually” achieved in a very intense and verifiable lab environment.
Again, that aside, the HOL environment, is one that we begin building months in advance of the events, and as most are aware, it is based on vCloud Director in a vPOD based model. The Captains start with the Lab Manual Build out back at the beginning of May. This is essentially a storyboard of the lab scenario that reflects the business problem and possible solutions and products required to solve that problem. We used a product called Screensteps to create the content and allow easy editing of the screenshots we needed to include into the manuals. We create a Lab Abstract Template, vPOD Configuration docs, and Visio diagrams of exactly what we need to include into each Lab Pod from an infrastructure and product perspective, build the base vPODs in our own vCloud orgs, and then turn those designs and completed vPODs over to the HOL Core Team for virtual build out to the WW Cloud. The overall idea is that once the vPOD is built and deployed into the various cloud DCs, it will be called up from the catalog by each lab participant “on-demand” and we actually create and deploy the lab in real-time (with some pre-population of the most popular labs). Once completed, the lab is “destroyed” and compute resources are returned to the pools. We complete this process literally 100,000-150,000 times during the week of VMWorld.
As you can see by the timeline below, we were under VERY tight time targets and each milestone counted!
Once the various drafts of the content for the manuals are completed and reviewed by the content leads, we then lock them in for lab manual build out and completion. Since we are often using alpha and beta versions of many new, or unannounced products for kickoff at VMWorld, it can be a bit dicey on build out, since we are also logging bug reports in the early code development and adjusting manuals to describe any workarounds needed to complete the lab. It is great that we have the chance to see these new products so early in development, but it adds to the workload and we get NO breaks in the timelines for delivery of our finished labs. SO that is where the pressure-cooker starts! This is also true for the Core Team as they are also using early, often unreleased alpha or beta builds and can run into similar issues. The additional effort that is required to be building out a lab environment while actively QA-ing new code drops at the same time is challenging…Days of 14-16 hours, nights, and weekends are considered the norm for Captains and Core Team as well as the Product Engineering folks who are on-site with us, so this volunteer effort is not for the faint of heart!
Once everything gets fully documented and deployed, the Core Team works their magic on pushing out everything to the three cloud environments: Las Vegas (Switch), Amsterdam (Colt), and Miami (Terremark). These are the sites from which we will be pulling sessions through View 5 into the labs for the attendees.
Finally, after months of creation and testing, we arrive onsite in Las Vegas several days prior to the event to setup the physical lab and begin testing. Again, long days and little sleep are the highlights of these final testing sessions where we bring up the labs and stress test the environment.
Heroics abound in and around the lab from the Captains, Proctors, Core Team, and support staff. Every year we worry, ”Can we really pull off such a trick of having 480 workstations all pulling virtual labs and manuals to a single event, and have it be smooth and without incident? Well, generally SOMETHING happens (small config errors, lose a piece of HW, etc.) but the teams band together to make sure things work, even if it requires brute force to do so! Somehow, we get through it (after 148,000 VMs) and then do it all again in Europe and PEX! (Though admittedly on smaller scales…250 seats in Copenhagen, and 120 seats at PEX to reflect the difference in overall attendance of the events.)
We also got to watch all of the lab activity via vCOPs (vCenter Operations), and saw exactly how dynamic and massive the environment really was.
No other technology company I know of provides this level of lab automation and complexity while providing a high value experience for our customer attendees. I am really looking forward to next year when we offer these labs all year round and allow everyone to take advantage of the great work hundreds of people have contributed to provide such a unique offering (more on this soon). So next time you see any of the “red shirts” that say LabStaff on them, give a note of thanks for all of the hard work these folks have put in to give our attendees, the best possible lab learning experience available anywhere! We will also have new labs and processes that we are already beginning to formulate for VMWorld 2012 and beyond, so stay tuned later in the year to see what we have in store! Please ask questions in the comments about the HOL, and Aaron and I will share what we can…