Archive for the ‘Authors’ Category
Yesterday, I attended the Carolina VMware Users Summit in Charlotte. The morning keynote speaker was Satyam Vaghani, who is really the father of VMFS. He walked the VMUG attendees through the history of VMFS, and VMware storage as a whole.
In my opinion, this was one of the most valuable sessions at the VMUG, and although my recording is terrible (Notability on iPad), this session deserves to be shared. I was able to reduce some of the noise in post processing, but this is definitely not broadcast quality.
If you’re really interested in VMware storage, it’s worth a listen, despite the quality. Unfortunately the slides from this are not available, but if they are posted, I will definitely link to them here. I honestly could have sat in this keynote for another couple hours absorbing information.
Here’s an outline:
- Birth of VMFS
- In’s and Out’s of Locking
- Optimistic Locking and Performance
- VAAI Intricacies
- The Future (vVOL, VM granular storage, I/O Demux)
Here’s a link to the recording. I will add slides as soon as I can get them.
Update: For those who exhibit lots of “Virtual Insanity” and want to know even more about VMFS, Satyam sent me a link to a white paper he wrote on VMFS.
VMworld 2012 is rapidly approaching, and believe it or not, it will be here before we know it! Call for papers is open now, and you have till May 18th to submit your idea to the VMworld team. Just in case you missed some of the details, VMworld US is to be hosted in San Francisco August 27-30 and VMworld Europe will be hosted in Barcelona, October 9-11. It will be an amazing event as always, with some really awesome technology announcements from VMware. Mark it in your calendar now, socialize the concept to your manager, inform the family, tell the neighbors, do what you need to do but get there!!
This year I wanted to participate in the creation process of the VMworld Labs as I think it’s such a remarkable component of the event. In 2010 we delivered over 200+ thousand VM’s to customers across 27 different labs, that is an amazing accomplishment. This years hands on labs will only be bigger, better, and even “more epic”. I can’t reveal all the details yet but stay tuned, as we have some very exciting things in flight right now.
One of the labs that caught my interest was the Socialcast labs. I wanted to do something outside my core knowledge base and pickup something that I haven’t had much exposure to. Socialcast is something I have had a lot of experience with from an end user consumption aspect but nothing on the backend infrastructure perspective. VMware has a great internal implementation of Socialcast that we have been using excessively for some time now. I can’t underscore how important Socialcast has become for our company as a place where we can share technical product infromation, idea’s and concepts, presentations, status polls, and basic collaboration. (There is even a Pets of VMware Photo Group)
“Socialcast software unites people, information and applications across the enterprise in a collaborative community. Help employees focus on meaningful work, share knowledge and discover data in real-time. Behind the firewall or in the cloud, Socialcast enables
secure enterprise collaboration in-context. “
Many people reading this blog post understand the power of collaboration and social media. It is an important component of our being to give back to the larger community to help foster idea’s and innovation. Socialcast is a framework that allows this collaboration to exist within the confines of your own protected environment.
I will be working with the Socailcast team over the next several months to design a impactful lab that I am hoping many of you will take. I have reached out to our CMO, and the VMworld team, to see if we can integrate Socialcast into the VMworld.com website so attendees of the event can actually utilize Socialcast during the convention. (Idea is still being considered). I am reaching out to one of my largest customers that has one of the biggest implementations of Socialcast in production to see if they are able to present at VMworld this year as an additional topic.
I’m looking forward to seeing you at VMworld, if your there please sign up for the Socialcast lab and give it a test drive!
-Scott
This is a problem I have seen now in two different environments, at two different companies. Both happened to be using VMware Data Recovery for backups.
The problem starts like this. You lose a host from vCenter, and you cannot get it to reconnect. You do a /sbin/services.sh restart, and still you cannot get connected to vCenter.
You CAN connect to the host locally using the vSphere Client. Let’s look at the logs now.
This particular problem shows up in the host.d log. To see it, go ahead and SSH into the host and type in: tail -f /var/log/hostd.log and then go into vCenter and right click on the host to Connect.
Watching the hostd.log, if you see any messages about snapshots during the 5 minutes it takes to time out, here’s how to see if you have this issue.
In your SSH session on the affected host, type in the following:
find /vmfs/volumes/*/* -name *delta*
You’ll see a list of all snapshots for VM’s running on this host. If you see a VM with a couple hundred snapshots, this is why your host won’t connect to vCenter. vCenter has a database limitation, and when a VM has more than the number of snapshots vCenter can catalog in the database, the host cannot be managed by vCenter. I haven’t figured out the exact limit for vCenter. A VM can have 496, according to this post by William Lam, but I think vCenter breaks before you get to that point. I had 235 on this suspect one.
To fix this, just connect locally to the host with vSphere Client and Consolidate your snapshots.
Once you’ve consolidated, your directory should look like the following.
Now, you can connect back to vCenter with no problem and no downtime!
Since this is a development environment, we didn’t pay a lot of attention to VDR, and just assumed it was working. This particular VM happened to be out of hard drive space, so it could not be quiesced, and VDR just kept trying. The bottom line is, pay attention to VDR errors!!! After this, we’ll be checking it at least every few days.
Just a friendly reminder to pickup (virtually) your free copy of the VMware vNews for the month of April. Make sure to sign up for e-mail alerts as we publish this monthly customer focused newsletter!
-Happy reading.
Scott
Virtual Insanity has two vExperts on staff now! Myself and Scott Sauer were both chosen vExpert for 2012.
What is a vExpert? The program was started by VMware in 2009 and is driven by VMware community mavens Alex Maier and John Troyer. The title is awarded to individuals who have significantly contributed to the VMware community over the past year. This is the first year employees are eligible, so I am glad to see Scott get recognized for his great writing here, and for putting together the vNews newsletter.
Here is more information on the vExpert program.
Here is the complete list for 2012, and a Twitter link to follow all the vExperts.
Congrats to everyone! I am proud to be on any list with these VMware heavyweights!!!
Thanks to all who voted! Thanks to Alex and John! I look forward to being able to contribute more to the community throughout the year.
If you’re using a Symmetrix array in your VMware environment, the Solutions Enabler virtual appliance is a must. It’s perfect for SRM, or the EMC VSI plugins in vCenter. Here’s how to get it setup and running.
1. Download the virtual appliance from Powerlink.
2. Deploy the template.
3. When you get to Properties, fill in the required info. Note here that if you have a subnet that doesn’t fit the classical TCP/IP class model, the OVA will complain about your netmask here. Follow the optional procedure below or the appliance won’t boot.
Also, put in the ESX server FQDN where the vAppliance is going to live initially. You’ll need to have that in there to mount your Gatekeeper devices. If you skip it, you can put it in later, but it does need to be ON the host you specify initially.
4. (Optional) If you have a non-traditional subnet where the vApp doesn’t accept your netmask, go into the Advanced vApp Options.
Double Click netmask.
In here, uncheck User Configurable, and put in your netmask.
Now the appliance should boot without complaining.
5. Open a web browser and go to https://<appliance address>:5480
6. Skip the security warning (unless you are @Texiwill) and go for the login.
User: seconfig
Password: seconfig
PROTIP: Change the password.
7. Once you’re in, click Gatekeeper Config and add your ESX host now if you haven’t yet. Again, ensure that the appliance resides on the host specified here for this initial step.
8. At this point, you should see Gatekeepers available to be mapped. Select the Gatekeepers you want and click the Map button all the way to the right.
Once mapped, the Gatekeepers will move to the bottom box. These are now RDM’s attached to the appliance.
9. Now go to Command Execution at the top, and click the Discover Symmetrix button. You should see your Symms in the Output Window.
Once you see your Symmetrices (sp?) there, you should be able to run any commands against them by inputting them in the top window and get output.
Now you can go connect your VSI, or SRM servers to this, and it should be fully functional.
This week was the 5.0.1 update of vShield App. I did run into some quirkyness installing, so here’s my quick rundown on how to upgrade vShield App to 5.0.1.
Step 1
Download the update from VMware’s site.
Step 2
Launch the vShield Manager home page and login.
Step 3
Click on Settings and Reports, then Updates.
Step 4
Click Upload Upgrade Bundle
Step 5
Select the file you downloaded and click Upload File
Step 6
Click Update Status and then Install and Confirm the install
VMware says you should see progress here, and mine stayed at 0% the whole time. You will see in vCenter it starts reconfiguring VM’s. On my upgrade, this step took 15 minutes or so, and the vShield Manager was rebooted. I just refreshed the browser and logged back in after I figured that out. If you are prompted to reboot, go ahead and do that now, followed by Finish Install. This is likely browser dependent, as I didn’t see either prompt.
Once you log back in, you should see the following on Update Status
When you see that, you are clear to update your hosts.
Updating hosts is easy, but time consuming.
Step 1
Click on a host in the vSphere Client, and click the vShield tab
Step 2
You will see an update is available. Click Update.
There is a warning, but it bears repeating. DO NOT do this on a host where vCenter or the vShield Manager reside.
Your host is going to go into maintenance mode, and from this point forward it’s hands-off.
In the vSphere Client you can see it going in and reconfiguring, then redeploying your Service Virtual Machine on that host.
Then you’re all done. Lather, rinse, and repeat.
Why upgrade to vShield App 5.0.1 ? Check this article on Duncan’s site for my favorite feature.
One last note. When you’re configuring your exclusions in vShield App, the vShield Manager is automatically excluded, as are the service VM’s.
With the right software, even a technology as old as the disk drive can overcome some of its own limitations. We can see many examples of this in the storage world these days. XIO is a great example of a company taking the same disk drives we’ve been struggling with for decades, and making them faster and more reliable.
Another up and coming company that believes in this approach is Pure Storage. I had the opportunity to visit their headquarters in Mountain View with the Virtualization Field Day crew, and got to see some of this software magic for myself. Chris Wahl, who had seen these guys before, made a comment that they are “SSD whisperers”. After my visit, I cannot disagree.
There’s no shortage of promises these days coming from the dozens of new startups centered around solid state disk technology. Before the Pure Storage visit, Mike Laverick was remarking how all these guys always say “we’re the only ones who actually GET solid state”. We all had a laugh, and wondered if Pure Storage would use that line. What we found was quite refreshing. Pure Storage didn’t feed us a lot of marketing or silly quotes. Instead, they actually made ME say “they’re the only ones who GET solid state”.
Pure Storage says they can sell you a solid state array for less money than a refrigerator-sized box of spinning rust. We’re not talking about less $ per IOP. We’re talking about less $ period. Like under $5 / GB. So 10x faster for less than the big spinning arrays. Let that sink in for a second.
With today’s advanced, auto-tiering arrays from the big boys in the storage business, it’s natural to wonder why all this is necessary. Why would we need an all flash array when we can pack a little bit of screaming fast SSD’s into a tray and let the array do the work to make sure your data is in the proper tier? With that methodology, can’t we can come in even cheaper per GB by adding massive SATA disks for the cold data? The answer to that question depends solely on your tolerance for latency.
In the chart above, we can see that even on some very high performing traditional arrays, even with the best tiering algorithms, we’re still going to see IO’s with very high latency. When you use the Pure Storage array, you don’t run that risk. In the demos that Pure Storage did for us, VMware had a hard time even measuring the disk performance, since it doesn’t offer anything less than 1 ms increments. As former VMware heavyweight Ravi Venkat pointed out, you can forget about SIOC, since you cannot set thresholds below 5 ms. If you see 5 ms from this array, it’s probably on fire.
As an EMC VMAX user, I can tell you that one of the underlying concerns in my mind every day is how FAST is working. I have very little visibility into what is actually being tiered, and when, and why. I have to just trust that EMC engineers are smarter than me, and that their tiering is going to prevent performance problems. While this may be easy for some, it’s very hard for me to just set it and forget it in my environment. There’s too much cost associated with latency for me to ignore the possibility that FAST is going to do the right thing at the right time. Plus, it’s reactionary. So even if it does do the right thing at the right time, the “right time” is still after the optimal time to have that data tiered higher.
This is one of the best things about the Pure Storage array, in my opinion. I don’t have to worry about whether the “magic” is working under the covers or not. All my data is on the fast stuff all the time, so I can relax. . . a little. ;-)
There are lots of features that make all this work reliably, and at much faster speeds than normal MLC. For an extremely detailed breakdown by Pure Storage’s co-founder and CTO John Colgrove, go here and watch the video.
For brevity’s sake, I’ll highlight a few features:
- - Inline dedupe using 512 byte segments (better ratios overall)
- - Compression
- - Thin Provisioning
- - Raid 3D (varies RAID levels based on current system activity – see video link)
- - High availability (no config stored on the controllers)
- - VAAI support
- - I/O optimization
That last one is the one I found most fascinating, and you can see John explain it more in the video. Every inbound write goes through a scheduling process that takes into account the current disk activity at a very granular level. Since writes are quite expensive on flash (in latency terms) versus reads, writes must be minimized, and highly distributed. This is where the scheduler comes in and looks at availability, workload, reliability, and lots of other characteristics of each piece of SSD. Then it makes a determination where to write that data to give the best latency. Also, if the system is loaded down, it can even pick a different RAID level dynamically to save on writes, thereby increasing performance.
This is where the magic is, in my opinion. It takes a lot of experience and know how to take MLC and make it as fast and as reliable as SLC, and based on their zero failure rate to date, I think they’ve done it. Of the 35 deployments they’ve done to date, 35% of those are for VMware environments. The industry mix is pretty interesting too, as you can see in the graphs below. This is not some niche product targeted at specific high performance applications.
I have heard some people question whether there is room for all these new storage startups. Since Pure Storage is a startup, I wanted to address the question, with regard to Pure Storage only.
First off, Pure Storage is an amazingly well funded company (from an outsider’s perspective). They’ve got $55M reasons why there is room for their storage startup. That $55M came from people who are a lot smarter than me, and can better answer the question of whether there is room. Check out the investor list.
Plus, they have attracted lots of top talent. I included a short list below, which doesn’t even include Ravi Venkat. Once again, I think the question of whether Pure Storage is a valid startup, or there’s a market for them is just silly. EMC, who just recently announced their roadmap’s inclusion of MLC, used to be a startup. Maybe they can get these guys to show them how to implement it.
Guess I better wrap this up before it becomes a rant. Check out Pure Storage, and check out the video link. Also there are some more articles from fellow delegates Chris Wahl, and Dwayne Lessner.
I have recently completed one of the most difficult (yet rewarding) portions of work that I have ever been challenged with at my 6 year tenure at VMware. That is, serving as a Lab Captain for the VMWorld Hands-On-Labs at both the US & European conferences as well as the recent Partner Exchange (PEX) event in Las Vegas. As many of you have read in Aaron’s recent post “The Layer between the Layers”, like him, I was also asked to specifically captain and write a section of a vFabric Lab for the HOL at both VMWorld events in Las Vegas and Copenhagen and at PEX. There are 27 Lab Captains for the US and an equal number for the EMEA show, plus a larger number of Proctors for both. As a “generalist” SE (i.e. NOT a specialist in vFabric or even an SME – Subject Matter Expert), I was appropriately intimidated to Captain a topic I was not an SME in, so I was looking for any vFabric Specialist help I could get! Fortunately, I was paired with a great colleague, Chris Harris, who was a vFabric Consultant in the UK.
Since this was so much a part of my life for the past 10 months, I wanted to give you all a taste of what this preparation process entailed. If for no other reason that to help me to decompress from the massive amount of creative work that we went through to prep for VMWorld 2011, but also to give the reader a flavor for the process of what it takes to stand up the HOL from a content perspective.
So let’s start with the content definition and pre-work. We had a plan to construct content around a “real-world’ customer implementation of VMware technology, rather than product centric demo names and examples. I personally thought this to be a double sword of opportunity. We could communicate to customers the “scenario” of how and why VMware can supply a solution to a specific problem, but I thought many attendees might be confused with “NO PRODUCT NAMES” in the scenarios. I agree that we need to avoid product sell in a technical lab environment, but we also need to inform our attendees in a bit more detail on the products that they will be concentrating on in the individual labs. (BTW, we are changing this next year…)
That aside, the labs this year continued to make great advances in not only the technical demo aspects, but also business application illustration examples as well. I am always amazed at the ability of the Core Team to adapt to the constantly changing, massively dynamic virtual workload demands of the lab (while using alpha and beta “dogfood” builds to equip the lab) in a “live-fire” environment. After working in the labs over the past several years, I think this is THE example environment that represents the “most extreme” examples of virtualization “stretch” in our customer base. By that I mean that the problems we face using cutting edge technologies, the latest beta (and sometimes alpha) code, and the massive workloads being generated and managed, are extraordinarily challenging (and really fun!…mostly…) J. Never let it be said that PCOIP does not work over the WAN…we ran an entire portion of the lab from Las Vegas in Copenhagen, and everyone thought it was local! So overall, we are often breaking new ground and demonstrating what can be “virtually” achieved in a very intense and verifiable lab environment.
Again, that aside, the HOL environment, is one that we begin building months in advance of the events, and as most are aware, it is based on vCloud Director in a vPOD based model. The Captains start with the Lab Manual Build out back at the beginning of May. This is essentially a storyboard of the lab scenario that reflects the business problem and possible solutions and products required to solve that problem. We used a product called Screensteps to create the content and allow easy editing of the screenshots we needed to include into the manuals. We create a Lab Abstract Template, vPOD Configuration docs, and Visio diagrams of exactly what we need to include into each Lab Pod from an infrastructure and product perspective, build the base vPODs in our own vCloud orgs, and then turn those designs and completed vPODs over to the HOL Core Team for virtual build out to the WW Cloud. The overall idea is that once the vPOD is built and deployed into the various cloud DCs, it will be called up from the catalog by each lab participant “on-demand” and we actually create and deploy the lab in real-time (with some pre-population of the most popular labs). Once completed, the lab is “destroyed” and compute resources are returned to the pools. We complete this process literally 100,000-150,000 times during the week of VMWorld.
As you can see by the timeline below, we were under VERY tight time targets and each milestone counted!
Once the various drafts of the content for the manuals are completed and reviewed by the content leads, we then lock them in for lab manual build out and completion. Since we are often using alpha and beta versions of many new, or unannounced products for kickoff at VMWorld, it can be a bit dicey on build out, since we are also logging bug reports in the early code development and adjusting manuals to describe any workarounds needed to complete the lab. It is great that we have the chance to see these new products so early in development, but it adds to the workload and we get NO breaks in the timelines for delivery of our finished labs. SO that is where the pressure-cooker starts! This is also true for the Core Team as they are also using early, often unreleased alpha or beta builds and can run into similar issues. The additional effort that is required to be building out a lab environment while actively QA-ing new code drops at the same time is challenging…Days of 14-16 hours, nights, and weekends are considered the norm for Captains and Core Team as well as the Product Engineering folks who are on-site with us, so this volunteer effort is not for the faint of heart!
Once everything gets fully documented and deployed, the Core Team works their magic on pushing out everything to the three cloud environments: Las Vegas (Switch), Amsterdam (Colt), and Miami (Terremark). These are the sites from which we will be pulling sessions through View 5 into the labs for the attendees.
Finally, after months of creation and testing, we arrive onsite in Las Vegas several days prior to the event to setup the physical lab and begin testing. Again, long days and little sleep are the highlights of these final testing sessions where we bring up the labs and stress test the environment.
Heroics abound in and around the lab from the Captains, Proctors, Core Team, and support staff. Every year we worry, ”Can we really pull off such a trick of having 480 workstations all pulling virtual labs and manuals to a single event, and have it be smooth and without incident? Well, generally SOMETHING happens (small config errors, lose a piece of HW, etc.) but the teams band together to make sure things work, even if it requires brute force to do so! Somehow, we get through it (after 148,000 VMs) and then do it all again in Europe and PEX! (Though admittedly on smaller scales…250 seats in Copenhagen, and 120 seats at PEX to reflect the difference in overall attendance of the events.)
We also got to watch all of the lab activity via vCOPs (vCenter Operations), and saw exactly how dynamic and massive the environment really was.
No other technology company I know of provides this level of lab automation and complexity while providing a high value experience for our customer attendees. I am really looking forward to next year when we offer these labs all year round and allow everyone to take advantage of the great work hundreds of people have contributed to provide such a unique offering (more on this soon). So next time you see any of the “red shirts” that say LabStaff on them, give a note of thanks for all of the hard work these folks have put in to give our attendees, the best possible lab learning experience available anywhere! We will also have new labs and processes that we are already beginning to formulate for VMWorld 2012 and beyond, so stay tuned later in the year to see what we have in store! Please ask questions in the comments about the HOL, and Aaron and I will share what we can…
Close your eyes for a moment and . . . . Wait. . don’t do that. . . But imagine for a moment your CEO calls your desk directly and is in a huge panic because one of his reports is taking way too long to run, and he needs it for the board meeting in 15 minutes. Instantly your life flashes before your eyes:
- All those arguments you had with the DBA’s and the application owners, and even your boss about how “we can’t possibly virtualize this application”.
- The meetings where the vendor said they would support it but they don’t “recommend” it.
- Conference calls where you told them they were just out of touch and that you could virtualize anything, and they wouldn’t even notice a performance hit.
- The look on their faces when they first tested the virtualized app and realized you were right.
And look at you now. This is all on you. It’s do or die time now.
So you bring up your preferred virtualization performance software to have a look. For your sake, I hope it’s Xangati VI Dashboard.
Having seen Xangati’s pitch before, and having tried the free version a couple years ago, I didn’t feel it was something I needed in my environment. However, last week at Virtualization Field Day 2 in Silicon Valley, the company’s founder, Jagan Jagannathan said one thing that really struck a chord.
“Liveness is what you need to do triage. If you want to do post-mortem, you don’t have to be live.”
He makes the point that in medical analysis, if you delay the analysis, even for a few minutes, the patient is dead. “Not all patients die. But some do.” It was at this point that the Xangati story clicked with me. It’s a tough product to get your head around in a quick demo, or marketing slide. But after hearing directly from the man who invented it, everything makes sense.
Jagan talks about other virtualization performance applications being largely database driven. They essentially suck in data at intervals, store it in a database, crunch it, and then pipe it out to a GUI for display. Some even require input from you on what interactions you might want to see before they even crunch the data.
Xangati sucks in the data and crunches it, with every interaction, all in RAM. This means the data you see is an order of magnitude more current from Xangati’s interface, than from the other guys’.
The other products are showing you a snapshot of data, followed by another snapshot, and so on. This is sufficient for the type of predictive trending coming out of vCenter Operations for example. Xangati can crunch 1 million metrics per second and pipe them right to your display.
Which data would you rather have when your CEO is standing over your shoulder? Which data would you rather have if you’re running thousands of VDI sessions like at the VMworld Labs? Xangati was VMware’s choice for the Labs environment. And since we have all taken a sort of Virtualization Hippocratic Oath by talking companies into virtualizing, we cannot afford to let our patients die on the table because we didn’t have the data to save them.
I had the good fortune of sitting with Jagan at dinner after their presentation, and we got into a conversation about a huge paradigm shift in our industry that’s happened over the past decade. A couple years ago, SAP founder Hasso Plattner was asked by his own employees why he felt the need to deliver an in-memory appliance. His response nails what I feel this paradigm shift is all about.
“People at SAP ask me, ‘Why do you insist on running a dunning program in seconds instead of two minutes? No one is asking for that type of speed for a dunning program,’ ” Plattner said.
“And I tell them, “You are asking the wrong question: the right question is, how long will someone with an iPhone wait for an answer? And the answer is that 15 seconds is the absolute maximum amount of time people will wait before they go and start doing something else: check voicemail, send text messages, check email, send text messages to themselves . . . . This is the new reality!”
In most enterprises a decade ago, the world did not come to an end if an application was down for a few hours. People took a long lunch, and moved on. In this new world, people go absolutely insane over the slightest performance degradation of any application.
Downtime is unthinkable, even for the most mundane and “insignificant” application. Can we blame all this on the iPhone? I’m not sure, but one thing I do know is that we had better have the tools to enable us to deliver on these expectations. Xangati is a huge step in the right direction.
There’s a lot more to Xangati, like industry leading awareness and visibility for VDI environments, and the ability for users to initiate recordings of metrics while a problem is occurring. Cool features abound. You can read about some of them over on Rodney Haywood , Dwayne Lessner and Chris Wahl’s blogs. For me, the one feature that stands out most is the live data. The life you save could be your own.
I’ll be heading to Virtualization Field Day 2 Feb 22-24 in Silicon Valley! What is Virtualization Field Day? It’s a 2 day event packed with in-depth and interactive Q&A between vendors in the virtualization space, and independent bloggers / writers / thought leaders in the industry.
Vendors get to showcase products that are real, or on the drawing board, and they get solid, candid feedback from independent IT pros that helps them make their products better for all of us.
Delegates get a first look at some of the coolest new technologies everyone will be talking about in the coming months, as well as an opportunity to get hands-on with them and ask the tough questions that would never be allowed in a webcast full of random people. Some things may be covered by NDA, or an embargo date, but the majority of the event can be viewed live right here as it happens!
If you can’t catch the stream live, follow us on Twitter with hashtag #VFD2, and tweet us your questions for the vendors. The videos will be posted after the event concludes so you can go back and catch anything you might miss.
This is my second Tech Field Day event, and based on the presenter and delegates list, it’s going to be fantastic!
What makes these events so valuable is the expectation of independence and objective nature of the delegates. Combine this with the hard work and dedication of Steven Foskett, and Matt Simmons, who plan everything to the last detail, and coach the vendors ahead of time so they don’t bring lame marketing presentations to real technical guys and gals. The stream will definitely be worth your time!
Are there vendors making something awesome you’d like to see present at an event down the road? Nominate them here!
Do you love technology, and work for a non-IT vendor? If you’d like to become a delegate, find out how here!
In the interest of full disclosure, delegates’ travel expenses to and from the event, as well as accommodations during the event are covered by sponsors. As with any tech event, delegates may receive swag from vendors, but delegates are not under any obligation to blog, tweet, or even like the products. Of course if there are cool products that interest delegates, they may be discussed on various social media sites, but there is no compensation, or expectation from either side after the event concludes.
Introduction
VMware made an exciting announcement at VMworld 2011 that didn’t get much press or attention. The VMworld labs were slated to be released for customers interested in doing technology previews of our software solutions in early 2012. Notice I didn’t use the term “Proof of Concept” as this implies different things to different people. Proof of concept could have business requirements, technical requirements, or users that are associated to your specific environment. I am happy to report that the “VMware Virtual Customer Labs” (vCL) are now available for **selected customers. I wanted to do a write-up about the vCL, what it is, and how it works as I think this is a unique offering that VMware is providing it’s customers.
What is the vCL?
The vCL is based off VMware vSphere 5, VMware vCloud Director 1.5 along with vCenter Orchestrator for automation. This is something that VMware has been using internally for years called the “vSEL” or the VMware SE Labs. vCL is designed to be a fully automated cloud solution where users can checkout VMware software solutions for 14 days of testing and training/education. The vCL was built around the concepts of saving customers time (manual installs, deployments, infrastructure configuration) and hardware costs as VMware hosts the environment on behalf of our customers.
The Workflow Automation
Automation is part of any cloud solution, if you stop to think about it your really getting a demonstration of vCloud Director along with any of the other labs you check out! Let’s kickoff the backend automation once a customer requests access to a lab environment. In this example I am the customer and I am interested in selecting the SRM 5 environment to test out. As a VMware systems engineer, I login (approval phase) and submit the request to the vCL system.
Below are the vCL options that I am going to configure for the customer, this includes the customer name, which lab they are interested in and basic information like an e-mail address. In this example I am using myself as the customer name to show some of this functionality.
Once I submit my request, I get an automated e-mail (below) indicating that my request has been accepted and the build process has been initiated. As you can see this might take slightly longer than normal as we are delivering full cloned vApps to ensure performance and a great user experience.
Once my environment has completed it’s provisioning process, the customer along with the VMware engineer get an e-mail confirming the build is complete. The e-mail contains the URL for accessing the environment, along with the custom username and password for authentication purposes.
Here comes the exciting part, let’s login! Here is the main splash screen where I authenticate with my credentials I received in the previous step. Note you need to accept the VMware EULA to access the environment or you will not be able to login and gain access.
I now have complete access to my personalized demo SRM environment where I can now begin testing SRM 5.0! As I mentioned earlier, I get 2 weeks to walk-through the lab and complete any testing I would like to perform. The lab manuals will be provided by the systems engineer that you work with when you request your access to the environment.
A Special Thanks!
I wanted to give special thanks and some recognition to the vCL team for all of their hard work and efforts that went into this project. It is still a work in progress, but the team is in the process of adding more labs to the service catalog. They are also planning on adding more back-end storage to accommodate supporting more customers and ensuring scalability from a performance perspective. Great work guys!
Note:
** Selected Customer indicates those that are supported by a pre-sales systems engineer. The SE is the owner of the customer experience and is responsible for coordinating the customer requests and ensuring they are getting the desired results from the vCL.
If you have HP BL 460 G7′s with the on-board 10GB CNA, you’re going to want to read this post regarding a problem with the latest firmware.
This was first noticed this issue when updating firmware to troubleshoot an issue where the storage doesn’t come back up after rebooting an upstream Nexus switch.
The symptoms are: the NIC comes back up, and the vfc is up, but all storage paths on that side of the fabric are still dead in ESXi 5.0. To fix this issue, the vfc or port channel must be shut /no shut.
I also saw an issue where the storage paths were dead, and the NIC never came back up. A reset of the Ethernet port will not fix this. A reboot of the ESXi host is required. Pay attention to the NIC state if you lose storage paths in this configuration with FCoE.
As part of my troubleshooting, I went to update the firmware on the CNA. The latest version of the firmware from HP is 4.0.360.15a. When updating using the Emulex utility, on about 20% of my blades, I got a CRC error during the upgrade process. Below is a screenshot of this error.
After retrying the firmware update, as stated in the utility, the same error occurred.
This is where you need to pay attention!!
During the POST process, the blade WILL report the correct firmware.

Since the firmware version is correct, one might assume the update was indeed successful. That’s a bad assumption. Upon further testing, we found the blades that failed the firmware update were the ones failing during the switch reloads.
There were only 2 blades that did NOT fail the firmware update, but still failed the switch reload process. They were replaced, and now I have no blades failing to reacquire storage paths after an upstream switch failure.
I must point out that HP has been unusually proactive with this issue, which is a nice change! I still have several blades in another datacenter that are not taking the firmware update. When I scheduled to have those all replaced, HP got some of their top people on it and scheduled a call. I tested their proposed fix this morning, which didn’t work.
They are actively working on a fix, so you won’t have to replace your blades. I will update this post as soon as I get word back from them on that fix. Meanwhile, if you’ve seen this, you might want to schedule some switch reloads during a maintenance window to make sure you are good to go.
Update 2/6:
As of today, there is no fix that I’m aware of. . . HP replaced the remaining blades after we tried a couple more proposed fixes. If I get word of a fix, I will post it here.
2012 ushers in some great new changes from the field technical team at VMware. I am merging the Ohio Valley Newsletter with the Wisconsin based field newsletter (aka vNews) in an effort to make it more all encompassing. This content is designed to inform our customers of important updates from VMware from a technical perspective. It also highlights some great public blog posts that might have snuck by you while you weren’t looking. We will be moving away from the older legacy.pdf based version of the newsletter to a modernized delivery method, “SlideRocket”. Here is the link to the first addition!
Please make sure you subscribe to the newsletter if you wish to receive these monthly newsletters in your inbox. As always feedback is welcome and will help shape the content for future issues of the vNews! Special thanks to Ben Sier, Vitaly Tsipris and Jeff Whitman for their contributions and driving to pull this off. Let us know your thoughts!
-Scott
Introduction
Virtualizing and running Java workloads on vSphere is absolutely a reality, but when I talk to customers I emphasize the same best practices as virtualizing Tier 1 workloads. The rules are not the same as basic consolidation and containment and you need to understand, plan, and architect your virtualization platform if you want to be successful.
I spend much of my time working with customer infrastructure engineers and architects, and when topics of Java come up, the conversation takes a turn. The infrastructure teams typically don’t want to get into the application stack and I can’t say that I blame them. Java and programming are a completely different skillset and the infrastructure engineers already have enough full time jobs keeping the datacenter running. The purpose of the blog post is to help shed some light on a new technology in vSphere 5 called “Elastic Memory for Java” or EM4J and hopefully some other simple Java best practices and information as well. The end state of this blog is to help you bring up an EM4J configuration of your own so you can begin to see the value and test your own JVM configurations. I am also writing this to help educate some of the infrastructure engineers and help explain why this feature matters (Disclaimer = I am not a Java programming guy).
What is EM4J?
Hopefully you are somewhat familiar with the intelligent memory management features that come with the vSphere platform such as memory ballooning. Ballooning is a great technique that allows you to reclaim memory from virtual machines if it’s not in use by the VM. When dealing with Java workloads a VMware best practice has always been to set reservations for the virtual machine. This means we are always guaranteeing (or backing) that the memory will be available to the VM when it needs it. When a memory reservation is set for a VM the hypervisor won’t reclaim memory from this VM (which means VM’s memory won’t be ballooned, compressed or swapped to persistent storage) if memory is tight on the host.
If you consider the definition of JVM (Java Virtual Machine) the last two words are important to consider when talking VMware virtualization. Running a VM on a VM creates somewhat of a problem for the hypervisor. The JVM is essentially a black box to the hypervisor and it has no visibility into what’s going on inside it’s environment. EM4J on the other hand allows one to reclaim memory through a much cheaper mechanism, and induces GCs at the moments when VM is handling relatively low load. It does not eliminate long pauses as VMs without full reservations can end up swapping, but it significantly reduces pause time and provides a more graceful performance degradation when running overcommitted, making workload’s performance more predictable. Now that I have described some of the characteristics, here is the actual definition according to the VMware documentation:
“Elastic Memory for Java (EM4J) manages a memory balloon that sits directly in the Java heap and works with new memory reclamation capabilities introduced in ESXi 5.0. EM4J works with the hypervisor to communicate system-wide memory pressure directly into the Java heap, forcing Java to clean up proactively and return memory at the most appropriate times—when it is least active. You no longer have to be so conservative with your heap sizing because unused heap memory is no longer wasted on uncollected garbage objects. And you no longer have to give Java 100% of the memory that it needs; EM4J ensures that memory is used more efficiently, without risking sudden and unpredictable performance problems.”
As you can see VMware is taking the same underlying technology that has been used for years across our customer base and applying it to Java workloads to gain more/better efficiencies at scale. The same performance characteristics apply to EM4J as they do to the ballooning in the VMware ESX hypervisor. Ballooning will only be invoked if the system is over committing memory, and has to begin utilizing its advanced memory management techniques. The benefit of EM4J is when the host is under memory pressure, the end user experience will be the same as if the VM was hard backed with physical RAM as we discussed earlier.
Getting started
EM4J is a product that works in conjunction with vSphere 5 and vFabric tc Server that is bundled with vFabric Standard and Advanced. EM4J can also work directly with Apache Tomcat. You might be asking yourself what is vFabric tc Server at this point and why the hell do I care about that? vFabric tc Server is a Java application server based on Apache Tomcat that VMware maintains and supports. This is a competitive product to a IBM WebSphere or an Oracle WebLogic, but is a much lighter weight Java container that allows faster deployments in development as well as production environments. As a systems infrastructure engineer it is imperative that you understand these types of Java workloads from a high level. Your success in moving these workloads into a virtual infrastructure depends on it and is irrelevant to EM4J. Before I jump in and show you how to set this up there are a few things we need to get out of the way first. Here is what your going to need to begin utilizing EM4J for your own testing, grab it now:
- VMware vSphere 5
- VMware vFabric tc Server 2.6
- VMware vSphere 5 Web Console (for reporting visibility)
- VMware vFabric EM4J Documentation
- VMware vFabric EM4J web console UI plug-in
- Redhat RHEL 5 operating system (Officially supported OS today)
- JVM Hotspot 1.6
Making it work in vSphere
As noted in my disclaimer above, I am not a Java guy so this took me some time to get my lab environment up and running with the right components since I am new to vFabric. RHEL is the officially supported operating system today, but Linux is Linux so I chose to grab the latest Ubuntu 11 distribution for my testing. Work with your internal Java guru to get vFabric tc Server setup and running on your Linux VM for testing. Once you get through setting up and installing your operating system and vFabric tc Server, there are some technical pre-requisites you need to accomplish in order to enable EM4J balloon driver and gain visibility into the JVM itself.
The first step you need to perform in your testing is to enable an advanced parameter within the Linux VM your are testing with. The virtual machine will need to be powered down to perform this action. Right click on the virtual machine, select edit settings, and the select the options tab. Go down to the advanced section and select “General” and then select the “Configuration Parameters” button that is now visible:
Once you select the “Configuration Parameters” button you are going to select the “Add Row” button and add the following configuration parameter to the VM:
sched.mem.pshare.guestHintsSyncEnable and set the value to “true” as shown below:
Making it work in tc Server
Once you have enabled the virtual machine for EM4J, you also need to ensure your instance of tc server utilizes the EM4Jbaloon driver. Execute the command listed below to create a new instance, in this example my instance name is “scott” and the “elastic memory” option is what enabled the EM4J balloon driver. Once you have created the instance, go ahead and start it up!
Next we will configure a few parameters within out instance so we can monitor them via the VMware vSphere web console interface which I will show you next. Add the following parameters to the setenv.sh file of your new instance name as follows:
JVM_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=6969
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false"
Next step we need to setup what is called the Console Guest Collector (CGC). The CGC is a process that allows the vSphere web console to pull data from the EM4J balloon driver and place it with each VM so the web client can then display performance data about the current workloads. This needs to be setup via a cron so we can continuously pull real-time data into vSphere. The cgc.sh script can be found in the /opt/vmware/vfabric-tc-server-standard-2.6.0.RELEASE/templates/elastic-memory/bin/ directory. Here is a command to add an entry to the crontab for every 5 minutes:
*/5 * * * * /opt/vmware/vfabric-tc-server-standard-2.6.1.RELEASE/templates/elastic-memory/bin/cgc.sh >
/dev/null 2>&1
Making it work in the vSphere Web Client
You downloaded the EM4J UI plug-in earlier and now we need to extract it and set it up on your vSphere 5 Virtual Center server. Extract the contents of the following directory then re-start the vSphere Web Client Service:
C:\Program Files\VMware\Infrastructure\vSphere Web Client\plugin-packages\em4j-client
The data!
Now that we are through the tedious stuff we can actually see some of the more interesting performance data, and frankly the reason you are probably reading this blog post! Log-in to your Virtual Center’s web interface and navigate to your virtual machine you are using to test with. Select the fourth tab at the top of the options section which is titles “Workloads”. You should now see something similar to this and the EM4J Agent Enabled should be selected if you setup everything correctly:
Selecting the “Alerts” tab will give you any relevant data and tell you if any issues are occurring. This will also display some Java Best Practices and instruct you on how to fine tune your JVM. Selecting the “Resource Management” tab will display much more performance centric detailed information which gives you full visibility into the JVM itself. Excellent performance visibility into that problematic Java workload:
Conclusion
From the documentation, “EM4J helps the system behave gracefully and predictably when memory becomes scarce. It helps you to more easily determine the over-commit ratio that provides acceptable performance at peak loads.” Hopefully you learned a little bit about what Elastic Memory for Java is and how it works within vFabric and VMware vSphere 5. As with most technology features and functionality I suggest understanding the best use cases for EM4J and how it fits into your own environment. The documentation that I linked to, gives plenty of examples of when EM4J should be utilized effectively. Look for more performance benchmarks around optimal overcommit ratios as our vFabric team completes some great performance testing on this exciting new technology. The EM4J architecture will not only allow you to run your JVM’s more efficiently, but will also provide you some great performance visibility and give insight into your Java workloads.



































