Archive for the ‘Authors’ Category

If you have a VMAX running 5876, beware when deleting hypers on disks containing VAULT devices.  One of my mainframe guys deleted hypers from his disk group in order to make them larger.  Once they are deleted, the disks without VAULT devices are all free.  Disks that do have VAULT devices do not return their free space.

Here’s an example:

If I run symdisk -v -hypers -gaps -sid xxx-disk_group <disk group number) list |MORE , I can see the hypers on the disks in that group. As we can see, disk DF-10D D5 does not contain a VAULT device.  My free space on this disk consists of a single 8MB GAP at the end.

If I go and delete the 15 devices from this disk, here’s what I get:


A nice clean disk with 279GB of free space ready to go.

Here’s a disk with a VAULT device:

Note the 5200MB VAULT, and the 13417GB GAP at the end, and note the free space number.  The GAP is the free space.  Now I’ll delete the 14 hypers.

Now, with all the hypers gone, my free space is still 13417.  So I’m missing the space that should have been freed when deleting those 14 18GB devices.

I have a ticket open on this, and the engineering team is looking at it.  I wanted to warn you guys, so you don’t get stuck with the same issue before they get it fixed.

 

vcops_perf

Introduction

A couple of weeks ago I was presenting at the regional Columbus, OH VMUG on “Troubleshooting Storage Performance in vSphere”.  The content was put together by our internal storage guru Joseph Dieckhans and I modified some of the content along the way.  If you are interested in seeing the presentation, you can view it here.

The presentation covers a lot of great information on troubleshooting with ESXTOP and identifying the various subcomponents of the storage stack that are important to monitor.  When I deliver this presentation, it typically brings up some great questions and conversations.  One of the questions that was asked was around VMware vCenter Operations Manager and it’s ability to monitor storage.  My answer was yes, vCops will do a great job monitoring your storage infrastructure, as it uses analytics to understand your storage performance and will send smart alerts when there are anomalies.  But the customer wanted to know if we were specifically  monitoring all of the components in ESXTOP that we were covering in the session.  Good question!

vCops Metrics

I decided to dig into this one to see if there were any gaps between good old ESXTOP and vCops so let’s take a look.  Below is a screenshot of the vCops disks statistics that are being monitored for the various LUNS.  In this example I am showing you a iSCSI device being presented to the ESX host. 

vcops-disk

As you can see vCops is monitoring latency, Kbps, and SCSI reservation conflicts.  That’s a pretty good list of metrics that you would want to know about if you suspected a problem with the storage infrastructure.  I think even CTU’s very own technical specialist, Chloe O’Brian, would be happy with those metrics.

chloe24

Get more Detail

If you think you’re better than Chloe, and need more detail than what’s provided out of the box with vCops, have no fear.  VMware vCops is very flexible and you can customize the data feeds in a lot of different ways.  You might have recently seen Clint Kitson’s posts around injecting metrics into vCops.  This was the first phase of EMC integrating their storage specifics metrics into vCops for analysis and reporting (unsupported).  EMC is working on an official adapter that their customers will be able to leverage if they are a VMware vCops customer.  I expect we sill see more and more storage vendors offering up a supported adapter for vCops in the future.

Powershell is a great way to pull VMware performance data. You can utilize “get-esxtop” or the “get-stat” commands the get the same visibility as what is covered in the troubleshooting storage presentation.  Let’s see if we can add more details to vCops than what is given to us out of the box.

PowerCLI commands

Let’s start with an important metric we covered in the presentation.  Let’s get the metric “KAVG” from PowerCLI and have it display data back for a system we are interested in monitoring.  Here I am utilizing the PowerCLI command “get-stat” to pull some statistics on the VMKernel and it’s associated latency.  (Should be below 0 ms, if above 2ms you should investigate!).

get_disk_stat

Connect-VIServer -Server [YOUR HOST] -User root -Password [Your Password]
get-stat -instance [YOUR DEVICE] -Stat disk.kernellatency.average

Here are the returned values I get back from the above query:

getstat_value

Let’s format the data results for vCops just append the following to the end of the above command so it looks like this:

Connect-VIServer -Server 192.168.1.101 -User root –Password  REDACTED get-stat -instance naa.5000144f05346019 -Stat disk.kernellatency.average | sort timestamp -desc | select -first 1 | select @{n="name";e={$_.instance}},value

 

Ok great, now we have the data points I am interested in so let’s take it into vCops with the work Clint Kitson and Matt Cowger put together.  The following powershell script now takes the output and passes it off to vCops via a http post command.

http_post

C:\Program Files (x86)\VMware\Infrastructure\vSphere PowerCLI> C:\Users
\ssauer\Desktop\kavg.ps1 | C:\Users\ssauer\Desktop\ps_vcops_httpost.ps1 -vcopsip
192.168.1.220 -devicename iSCSI -resourcedescription "iSCSI KAVG" -devicetype p
s-vmware-esxtop -protocol https -vcopsuser admin -vcopspass *REDACTED* -post;sleep
60

 

Let’s login to the vCops custom UI and check out our data to see if it’s posting correctly.  (https://(VCOPS-IP/vcops-custom).  Navigate to the environment tab at the top of the screen, then select the option “environment overview” to find the new http post.  It most likely will show a blue icon as vCops hasn’t had enough time to baseline the data to understand the dynamic thresholds.

vcops_data

The above data graph isn’t really that sexy, since my home ESX lab host isn’t being worked hard enough to calculate.You can now setup a task to run the powershell script every x amount of minutes to automate the data pull.  From here you can now create a customized dashboard for the specific data metrics you would like to present back to your operations team or possibly your manager to show him why you deserve a raise.

Conclusion

The question about getting ESXTOP data into vCops has now been answered.  With the example above you can now pull some specific ESXTOP or statistics into the product.  This is obviously not an approved or supported method, and certainly not a method I would recommend implementing in a large scale fashion.  It is a helpful utility that you can leverage for troubleshooting performance problems in your storage stack.  Not only do you have a visual representation of these data metrics, but you are now leveraging the vCops patent analytics to start getting smart alerts on data anomalies.

-Scott

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Yesterday, I attended the Carolina VMware Users Summit in Charlotte.  The morning keynote speaker was Satyam Vaghani, who is really the father of VMFS.  He walked the VMUG attendees through the history of VMFS, and VMware storage as a whole.

In my opinion, this was one of the most valuable sessions at the VMUG, and although my recording is terrible (Notability on iPad), this session deserves to be shared.  I was able to reduce some of the noise in post processing, but this is definitely not broadcast quality.

If you’re really interested in VMware storage, it’s worth a listen, despite the quality.  I honestly could have sat in this keynote for another couple hours absorbing information.

 

Here’s an outline:

- Birth of VMFS

- In’s and Out’s of Locking

- Optimistic Locking and Performance

- VAAI Intricacies

- The Future (vVOL, VM granular storage, I/O Demux)

 

Here’s a link to the recording.  I will add slides as soon as I can get them.

Slides are here!!

Update: For those who exhibit lots of “Virtual Insanity” and want to know even more about VMFS, Satyam sent me a link to a white paper he wrote on VMFS. 

 

vmwld2012

 

VMworld 2012 is rapidly approaching, and believe it or not, it will be here before we know it!  Call for papers is open now, and you have till May 18th to submit your idea to the VMworld team.  Just in case you missed some of the details, VMworld US is to be hosted in San Francisco August 27-30 and VMworld Europe will be hosted in Barcelona, October 9-11.  It will be an amazing event as always, with some really awesome technology announcements from VMware.  Mark it in your calendar now, socialize the concept to your manager, inform the family, tell the neighbors, do what you need to do but get there!!

 

socialcast

This year I wanted to participate in the creation process of the VMworld Labs as I think it’s such a remarkable component of the event.  In 2010 we delivered over 200+ thousand VM’s to customers across 27 different labs, that is an amazing accomplishment.  This years hands on labs will only be bigger, better, and even “more epic”.  I can’t reveal all the details yet but stay tuned, as we have some very exciting things in flight right now.

One of the labs that caught my interest was the Socialcast labs.  I wanted to do something outside my core knowledge base and pickup something that I haven’t had much exposure to.  Socialcast is something I have had a lot of experience with from an end user consumption aspect but nothing on the backend infrastructure perspective.  VMware has a great internal implementation of Socialcast that we have been using excessively for some time now.  I can’t underscore how important Socialcast has become for our company as a place where we can share technical product infromation, idea’s and concepts, presentations, status polls, and basic collaboration. (There is even a Pets of VMware Photo Group)  :-)

 

gumdrops

 

“Socialcast software unites people, information and applications across the enterprise in a collaborative community. Help employees focus on meaningful work, share knowledge and discover data in real-time. Behind the firewall or in the cloud, Socialcast enables
secure enterprise collaboration in-context. “ 

Many people reading this blog post understand the power of collaboration and social media.  It is an important component of our being to give back to the larger community to help foster idea’s and innovation.  Socialcast is a framework that allows this collaboration to exist within the confines of your own protected environment. 

I will be working with the Socailcast team over the next several months to design a impactful lab that I am hoping many of you will take.  I have reached out to our CMO, and the VMworld team, to see if we can integrate Socialcast into the VMworld.com website so attendees of the event can actually utilize Socialcast during the convention.  (Idea is still being considered).  I am reaching out to one of my largest customers that has one of the biggest implementations of Socialcast in production to see if they are able to present at VMworld this year as an additional topic.

I’m looking forward to seeing you at VMworld, if your there please sign up for the Socialcast lab and give it a test drive!

-Scott

This is a problem I have seen now in two different environments, at two different companies.  Both happened to be using VMware Data Recovery for backups.

 

The problem starts like this. You lose a host from vCenter, and you cannot get it to reconnect.  You do a /sbin/services.sh restart, and still you cannot get connected to vCenter.

 

 

 

 

 

 

 

 

 

 

You CAN connect to the host locally using the vSphere Client.  Let’s look at the logs now.

 

This particular problem shows up in the host.d log.  To see it, go ahead and SSH into the host and type in: tail -f /var/log/hostd.log  and then go into vCenter and right click on the host to Connect.

 

 

Watching the hostd.log, if you see any messages about snapshots during the 5 minutes it takes to time out, here’s how to see if you have this issue.

 

In your SSH session on the affected host, type in the following:

find /vmfs/volumes/*/* -name *delta*

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

You’ll see a list of all snapshots for VM’s running on this host.  If you see a VM with a couple hundred snapshots, this is why your host won’t connect to vCenter.  vCenter has a database limitation, and when a VM has more than the number of snapshots vCenter can catalog in the database, the host cannot be managed by vCenter.  I haven’t figured out the exact limit for vCenter.  A VM can have 496, according to this post by William Lam, but I think vCenter breaks before you get to that point.  I had 235 on this suspect one.
To fix this, just connect locally to the host with vSphere Client and Consolidate your snapshots.

 

 

 

 

 

 

 

 

 

 

 

 

 

Once you’ve consolidated, your directory should look like the following.

 

 

 

 

 

 

 

 

 

 

 

 

 

Now, you can connect back to vCenter with no problem and no downtime!

 

Since this is a development environment, we didn’t pay a lot of attention to VDR, and just assumed it was working.  This particular VM happened to be out of hard drive space, so it could not be quiesced, and VDR just kept trying.  The bottom line is, pay attention to VDR errors!!!  After this, we’ll be checking it at least every few days.

 

 

Just a friendly reminder to pickup (virtually) your free copy of the VMware vNews for the month of April.  Make sure to sign up for e-mail alerts as we publish this monthly customer focused newsletter!

 

vnews

-Happy reading.

Scott

 

 

 

 

 

Virtual Insanity has two vExperts on staff now! Myself and Scott Sauer were both chosen vExpert for 2012.

 

What is a vExpert? The program was started by VMware in 2009 and is driven by VMware community mavens Alex Maier and John Troyer. The title is awarded to individuals who have significantly contributed to the VMware community over the past year. This is the first year employees are eligible, so I am glad to see Scott get recognized for his great writing here, and for putting together the vNews newsletter.

 

Here is more information on the vExpert program.

 

Here is the complete list for 2012, and a Twitter link to follow all the vExperts.

 

Congrats to everyone! I am proud to be on any list with these VMware heavyweights!!!

 

Thanks to all who voted! Thanks to Alex and John! I look forward to being able to contribute more to the community throughout the year.

 

 

If you’re using a Symmetrix array in your VMware environment, the Solutions Enabler virtual appliance is a must.  It’s perfect for SRM, or the EMC VSI plugins in vCenter.  Here’s how to get it setup and running.

 

1. Download the virtual appliance from Powerlink.

 

 

 

2. Deploy the template.

 

 

 

3. When you get to Properties, fill in the required info.  Note here that if you have a subnet that doesn’t fit the classical TCP/IP class model, the OVA will complain about your netmask here. Follow the optional procedure below or the appliance won’t boot.

Also, put in the ESX server FQDN where the vAppliance is going to live initially.  You’ll need to have that in there to mount your Gatekeeper devices.  If you skip it, you can put it in later, but it does need to be ON the host you specify initially.

 

 

 

4. (Optional) If you have a non-traditional subnet where the vApp doesn’t accept your netmask, go into the Advanced vApp Options.

 

 

Double Click netmask.

 

 

In here, uncheck User Configurable, and put in your netmask.

 

 

 

Now the appliance should boot without complaining.

 

5. Open a web browser and go to https://<appliance address>:5480

 

 

6. Skip the security warning (unless you are @Texiwill) and go for the login.

User: seconfig

Password: seconfig

PROTIP: Change the password.

 

 

7. Once you’re in, click Gatekeeper Config and add your ESX host now if you haven’t yet.  Again, ensure that the appliance resides on the host specified here for this initial step.

 

 

8. At this point, you should see Gatekeepers available to be mapped.  Select the Gatekeepers you want and click the Map button all the way to the right.

 

Once mapped, the Gatekeepers will move to the bottom box.  These are now RDM’s attached to the appliance.

 

9. Now go to Command Execution at the top, and click the Discover Symmetrix button.  You should see your Symms in the Output Window.

 

 

Once you see your Symmetrices (sp?) there, you should be able to run any commands against them by inputting them in the top window and get output.

Now you can go connect your VSI, or SRM servers to this, and it should be fully functional.

 

This week was the 5.0.1 update of vShield App.  I did run into some quirkyness installing, so here’s my quick rundown on how to upgrade vShield App to 5.0.1.

 

Step 1

Download the update from VMware’s site.

 

Step 2

Launch the vShield Manager home page and login.

 

Step 3

Click on Settings and Reports, then Updates.

 

Step 4

Click Upload Upgrade Bundle

 

Step 5

Select the file you downloaded and click Upload File

 

Step 6

Click Update Status and then Install and Confirm the install

 

VMware says you should see progress here, and mine stayed at 0% the whole time.  You will see in vCenter it starts reconfiguring VM’s.  On my upgrade, this step took 15 minutes or so, and the vShield Manager was rebooted.  I just refreshed the browser and logged back in after I figured that out.  If you are prompted to reboot, go ahead and do that now, followed by Finish Install.  This is likely browser dependent, as I didn’t see either prompt.

 

Once you log back in, you should see the following on Update Status

When you see that, you are clear to update your hosts.

Updating hosts is easy, but time consuming.

 

Step 1

Click on a host in the vSphere Client, and click the vShield tab

 

Step 2

You will see an update is available.  Click Update.

There is a warning, but it bears repeating.  DO NOT do this on a host where vCenter or the vShield Manager reside.

Your host is going to go into maintenance mode, and from this point forward it’s hands-off.

In the vSphere Client you can see it going in and reconfiguring, then redeploying your Service Virtual Machine on that host.

 

Then you’re all done.  Lather, rinse, and repeat.

 

Why upgrade to vShield App 5.0.1 ? Check this article on Duncan’s site for my favorite feature.

One last note.  When you’re configuring your exclusions in vShield App, the vShield Manager is automatically excluded, as are the service VM’s.

With the right software, even a technology as old as the disk drive can overcome some of its own limitations.  We can see many examples of this in the storage world these days.  XIO is a great example of a company taking the same disk drives we’ve been struggling with for decades, and making them faster and more reliable.

Another up and coming company that believes in this approach is Pure Storage.  I had the opportunity to visit their headquarters in Mountain View with the Virtualization Field Day crew, and got to see some of this software magic for myself.  Chris Wahl, who had seen these guys before, made a comment that they are “SSD whisperers”.  After my visit, I cannot disagree.

There’s no shortage of promises these days coming from the dozens of new startups centered around solid state disk technology.  Before the Pure Storage visit, Mike Laverick was remarking how all these guys always say “we’re the only ones who actually GET solid state”.  We all had a laugh, and wondered if Pure Storage would use that line.  What we found was quite refreshing.  Pure Storage didn’t feed us a lot of marketing or silly quotes.  Instead, they actually made ME say “they’re the only ones who GET solid state”.

Pure Storage says they can sell you a solid state array for less money than a refrigerator-sized box of spinning rust.  We’re not talking about less $ per IOP.  We’re talking about less $ period. Like under $5 / GB.  So 10x faster for less than the big spinning arrays.  Let that sink in for a second.

With today’s advanced, auto-tiering arrays from the big boys in the storage business, it’s natural to wonder why all this is necessary.  Why would we need an all flash array when we can pack a little bit of screaming fast SSD’s into a tray and let the array do the work to make sure your data is in the proper tier?  With that methodology, can’t we can come in even cheaper per GB by adding massive SATA disks for the cold data?  The answer to that question depends solely on your tolerance for latency.

 

In the chart above, we can see that even on some very high performing traditional arrays, even with the best tiering algorithms, we’re still going to see IO’s with very high latency.  When you use the Pure Storage array, you don’t run that risk.  In the demos that Pure Storage did for us, VMware had a hard time even measuring the disk performance, since it doesn’t offer anything less than 1 ms increments.  As former VMware heavyweight Ravi Venkat pointed out, you can forget about SIOC, since you cannot set thresholds  below 5 ms.  If you see 5 ms from this array, it’s probably on fire.

As an EMC VMAX user, I can tell you that one of the underlying concerns in my mind every day is how FAST is working.  I have very little visibility into what is actually being tiered, and when, and why.  I have to just trust that EMC engineers are smarter than me, and that their tiering is going to prevent performance problems.  While this may be easy for some, it’s very hard for me to just set it and forget it in my environment.  There’s too much cost associated with latency for me to ignore the possibility that FAST is going to do the right thing at the right time.  Plus, it’s reactionary.  So even if it does do the right thing at the right time, the “right time” is still after the optimal time to have that data tiered higher.

This is one of the best things about the Pure Storage array, in my opinion.  I don’t have to worry about whether the “magic” is working under the covers or not.  All my data is on the fast stuff all the time, so I can relax. . . a little.  ;-)

There are lots of features that make all this work reliably, and at much faster speeds than normal MLC.  For an extremely detailed breakdown by Pure Storage’s co-founder and CTO John Colgrove, go here and watch the video.

For brevity’s sake, I’ll highlight a few features:

  • - Inline dedupe using 512 byte segments (better ratios overall)
  • - Compression
  • - Thin Provisioning
  • - Raid 3D (varies RAID levels based on current system activity – see video link)
  • - High availability (no config stored on the controllers)
  • - VAAI support
  • - I/O optimization

That last one is the one I found most fascinating, and you can see John explain it more in the video.  Every inbound write goes through a scheduling process that takes into account the current disk activity at a very granular level.  Since writes are quite expensive on flash (in latency terms) versus reads, writes must be minimized, and highly distributed.  This is where the scheduler comes in and looks at availability, workload, reliability, and lots of other characteristics of each piece of SSD.  Then it makes a determination where to write that data to give the best latency.  Also, if the system is loaded down, it can even pick a different RAID level dynamically to save on writes, thereby increasing performance.

This is where the magic is, in my opinion.  It takes a lot of experience and know how to take MLC and make it as fast and as reliable as SLC, and based on their zero failure rate to date, I think they’ve done it.  Of the 35 deployments they’ve done to date, 35% of those are for VMware environments.  The industry mix is pretty interesting too, as you can see in the graphs below.  This is not some niche product  targeted at specific high performance applications.

 

 

I have heard some people question whether there is room for all these new storage startups.  Since Pure Storage is a startup, I wanted to address the question, with regard to Pure Storage only.

First off, Pure Storage is an amazingly well funded company (from an outsider’s perspective).   They’ve got $55M reasons why there is room for their storage startup.  That $55M came from people who are a lot smarter than me, and can better answer the question of whether there is room.  Check out the investor list.

 

 

Plus, they have attracted lots of top talent.  I included a short list below, which doesn’t even include Ravi Venkat.  Once again, I think the question of whether Pure Storage is a valid startup, or there’s a market for them is just silly.  EMC, who just recently announced their roadmap’s inclusion of MLC, used to be a startup.  Maybe they can get these guys to show them how to implement it.

 

 

Guess I better wrap this up before it becomes a rant.  Check out Pure Storage, and check out the video link.  Also there are some more articles from fellow delegates Chris Wahl, and Dwayne Lessner.

 

 

image

I have recently completed one of the most difficult (yet rewarding) portions of work that I have ever been challenged with at my 6 year tenure at VMware. That is, serving as a Lab Captain for the VMWorld Hands-On-Labs at both the US & European conferences as well as the recent Partner Exchange (PEX) event in Las Vegas. As many of you have read in Aaron’s recent post “The Layer between the Layers”, like him, I was also asked to specifically captain and write a section of a vFabric Lab for the HOL at both VMWorld events in Las Vegas and Copenhagen and at PEX. There are 27 Lab Captains for the US and an equal number for the EMEA show, plus a larger number of Proctors for both. As a “generalist” SE (i.e. NOT a specialist in vFabric or even an SME – Subject Matter Expert), I was appropriately intimidated to Captain a topic I was not an SME in, so I was looking for any vFabric Specialist help I could get! Fortunately, I was paired with a great colleague, Chris Harris, who was a vFabric Consultant in the UK.

Since this was so much a part of my life for the past 10 months, I wanted to give you all a taste of what this preparation process entailed. If for no other reason that to help me to decompress from the massive amount of creative work that we went through to prep for VMWorld 2011, but also to give the reader a flavor for the process of what it takes to stand up the HOL from a content perspective.

clip_image002

So let’s start with the content definition and pre-work. We had a plan to construct content around a “real-world’ customer implementation of VMware technology, rather than product centric demo names and examples. I personally thought this to be a double sword of opportunity. We could communicate to customers the “scenario” of how and why VMware can supply a solution to a specific problem, but I thought many attendees might be confused with “NO PRODUCT NAMES” in the scenarios. I agree that we need to avoid product sell in a technical lab environment, but we also need to inform our attendees in a bit more detail on the products that they will be concentrating on in the individual labs. (BTW, we are changing this next year…)

That aside, the labs this year continued to make great advances in not only the technical demo aspects, but also business application illustration examples as well. I am always amazed at the ability of the Core Team to adapt to the constantly changing, massively dynamic virtual workload demands of the lab (while using alpha and beta “dogfood” builds to equip the lab) in a “live-fire” environment. After working in the labs over the past several years, I think this is THE example environment that represents the “most extreme” examples of virtualization “stretch” in our customer base.  By that I mean that the problems we face using cutting edge technologies, the latest beta (and sometimes alpha) code, and the massive workloads being generated and managed, are extraordinarily challenging (and really fun!…mostly…) J.  Never let it be said that PCOIP does not work over the WAN…we ran an entire portion of the lab from Las Vegas in Copenhagen, and everyone thought it was local! So overall, we are often breaking new ground and demonstrating what can be “virtually” achieved in a very intense and verifiable lab environment.

Again, that aside, the HOL environment, is one that we begin building months in advance of the events, and as most are aware, it is based on vCloud Director in a vPOD based model. The Captains start with the Lab Manual Build out back at the beginning of May. This is essentially a storyboard of the lab scenario that reflects the business problem and possible solutions and products required to solve that problem. We used a product called Screensteps to create the content and allow easy editing of the screenshots we needed to include into the manuals. We create a Lab Abstract Template, vPOD Configuration docs, and Visio diagrams of exactly what we need to include into each Lab Pod from an infrastructure and product perspective, build the base vPODs in our own vCloud orgs, and then turn those designs and completed vPODs over to the HOL Core Team for virtual build out to the WW Cloud. The overall idea is that once the vPOD is built and deployed into the various cloud DCs, it will be called up from the catalog by each lab participant “on-demand” and we actually create and deploy the lab in real-time (with some pre-population of the most popular labs). Once completed, the lab is “destroyed” and compute resources are returned to the pools. We complete this process literally 100,000-150,000 times during the week of VMWorld.

clip_image004

As you can see by the timeline below, we were under VERY tight time targets and each milestone counted!

clip_image006

Once the various drafts of the content for the manuals are completed and reviewed by the content leads, we then lock them in for lab manual build out and completion. Since we are often using alpha and beta versions of many new, or unannounced products for kickoff at VMWorld, it can be a bit dicey on build out, since we are also logging bug reports in the early code development and adjusting manuals to describe any workarounds needed to complete the lab. It is great that we have the chance to see these new products so early in development, but it adds to the workload and we get NO breaks in the timelines for delivery of our finished labs. SO that is where the pressure-cooker starts! This is also true for the Core Team as they are also using early, often unreleased alpha or beta builds and can run into similar issues. The additional effort that is required to be building out a lab environment while actively QA-ing new code drops at the same time is challenging…Days of 14-16 hours, nights, and weekends are considered the norm for Captains and Core Team as well as the Product Engineering folks who are on-site with us, so this volunteer effort is not for the faint of heart!

Once everything gets fully documented and deployed, the Core Team works their magic on pushing out everything to the three cloud environments: Las Vegas (Switch), Amsterdam (Colt), and Miami (Terremark). These are the sites from which we will be pulling sessions through View 5 into the labs for the attendees.

Finally, after months of creation and testing, we arrive onsite in Las Vegas several days prior to the event to setup the physical lab and begin testing.  Again, long days and little sleep are the highlights of these final testing sessions where we bring up the labs and stress test the environment.

clip_image008

Heroics abound in and around the lab from the Captains, Proctors, Core Team, and support staff. Every year we worry, ”Can we really pull off such a trick of having 480 workstations all pulling virtual labs and manuals to a single event, and have it be smooth and without incident? Well, generally SOMETHING happens (small config errors, lose a piece of HW, etc.) but the teams band together to make sure things work, even if it requires brute force to do so! Somehow, we get through it (after 148,000 VMs) and then do it all again in Europe and PEX! (Though admittedly on smaller scales…250 seats in Copenhagen, and 120 seats at PEX to reflect the difference in overall attendance of the events.)

clip_image010

We also got to watch all of the lab activity via vCOPs (vCenter Operations), and saw exactly how dynamic and massive the environment really was.

clip_image012

No other technology company I know of provides this level of lab automation and complexity while providing a high value experience for our customer attendees. I am really looking forward to next year when we offer these labs all year round and allow everyone to take advantage of the great work hundreds of people have contributed to provide such a unique offering (more on this soon). So next time you see any of the “red shirts” that say LabStaff on them, give a note of thanks for all of the hard work these folks have put in to give our attendees, the best possible lab learning experience available anywhere! We will also have new labs and processes that we are already beginning to formulate for VMWorld 2012 and beyond, so stay tuned later in the year to see what we have in store! Please ask questions in the comments about the HOL, and Aaron and I will share what we can…

Close your eyes for a moment and . . . . Wait. . don’t do that. . . But imagine for a moment your CEO calls your desk directly and is in a huge panic because one of his reports is taking way too long to run, and he needs it for the board meeting in 15 minutes.  Instantly your life flashes before your eyes: 

  • All those arguments you had with the DBA’s and the application owners, and even your boss about how “we can’t possibly virtualize this application”. 
  • The meetings where the vendor said they would support it but they don’t “recommend” it. 
  • Conference calls where you told them they were just out of touch and that you could virtualize anything, and they wouldn’t even notice a performance hit. 
  • The look on their faces when they first tested the virtualized app and realized you were right.

 

And look at you now.  This is all on you.  It’s do or die time now.

 

So you bring up your preferred virtualization performance software to have a look.  For your sake, I hope it’s Xangati VI Dashboard.  

 

Having seen Xangati’s pitch before, and having tried the free version a couple years ago, I didn’t feel it was something I needed in my environment.  However, last week at Virtualization Field Day 2 in Silicon Valley, the company’s founder, Jagan Jagannathan said one thing that really struck a chord.

 

“Liveness is what you need to do triage.  If you want to do post-mortem, you don’t have to be live.”

 

He makes the point that in medical analysis, if you delay the analysis, even for a few minutes, the patient is dead.  “Not all patients die.  But some do.”  It was at this point that the Xangati story clicked with me.  It’s a tough product to get your head around in a quick demo, or marketing slide.  But after hearing directly from the man who invented it, everything makes sense. 

 

Jagan talks about other virtualization performance applications being largely database driven.  They essentially suck in data at intervals, store it in a database, crunch it, and then pipe it out to a GUI for display.  Some even require input from you on what interactions you might want to see before they even crunch the data.  

Xangati sucks in the data and crunches it, with every interaction, all in RAM.  This means the data you see is an order of magnitude more current from Xangati’s interface, than from the other guys’. 

 

 

The other products are showing you a snapshot of data, followed by another snapshot, and so on.  This is sufficient for the type of predictive trending coming out of vCenter Operations for example.  Xangati can crunch 1 million metrics per second and pipe them right to your display. 

 

Which data would you rather have when your CEO is standing over your shoulder?  Which data would you rather have if you’re running thousands of VDI sessions like at the VMworld Labs?  Xangati was VMware’s choice for the Labs environment.  And since we have all taken a sort of Virtualization Hippocratic Oath by talking companies into virtualizing, we cannot afford to let our patients die on the table because we didn’t have the data to save them. 

 

I had the good fortune of sitting with Jagan at dinner after their presentation, and we got into a conversation about a huge paradigm shift in our industry that’s happened over the past decade.  A couple years ago, SAP founder Hasso Plattner was asked by his own employees why he felt the need to deliver an in-memory appliance.  His response nails what I feel this paradigm shift is all about.

 

“People at SAP ask me, ‘Why do you insist on running a dunning program in seconds instead of two minutes? No one is asking for that type of speed for a dunning program,’ ” Plattner said.

“And I tell them, “You are asking the wrong question: the right question is, how long will someone with an iPhone wait for an answer? And the answer is that 15 seconds is the absolute maximum amount of time people will wait before they go and start doing something else: check voicemail, send text messages, check email, send text messages to themselves . . . . This is the new reality!”

 

In most enterprises a decade ago, the world did not come to an end if an application was down for a few hours.  People took a long lunch, and moved on.  In this new world, people go absolutely insane over the slightest performance degradation of any application. 

 

Downtime is unthinkable, even for the most mundane and “insignificant” application.  Can we blame all this on the iPhone?  I’m not sure, but one thing I do know is that we had better have the tools to enable us to deliver on these expectations.  Xangati is a huge step in the right direction. 

 

There’s a lot more to Xangati, like industry leading awareness and visibility for VDI environments, and the ability for users to initiate recordings of metrics while a problem is occurring.  Cool features abound.  You can read about some of them over on Rodney Haywood , Dwayne Lessner and Chris Wahl’s blogs.   For me, the one feature that stands out most is the live data.  The life you save could be your own.

 

 

I’ll be heading to Virtualization Field Day 2 Feb 22-24 in Silicon Valley!  What is Virtualization Field Day?  It’s a 2 day event packed with in-depth and interactive Q&A between vendors in the virtualization space, and independent bloggers / writers / thought leaders in the industry.

Vendors get to showcase products that are real, or on the drawing board, and they get solid, candid feedback from independent IT pros that helps them make their products better for all of us.

Delegates get a first look at some of the coolest new technologies everyone will be talking about in the coming months, as well as an opportunity to get hands-on with them and ask the tough questions that would never be allowed in a webcast full of random people.  Some things may be covered by NDA, or an embargo date, but the majority of the event can be viewed live right here as it happens!

If you can’t catch the stream live, follow us on Twitter with hashtag #VFD2, and tweet us your questions for the vendors.  The videos will be posted after the event concludes so you can go back and catch anything you might miss.

This is my second Tech Field Day event, and based on the presenter and delegates list, it’s going to be fantastic!

What makes these events so valuable is the expectation of independence and objective nature of the delegates.  Combine this with the hard work and dedication of Steven Foskett, and Matt Simmons, who plan everything to the last detail, and coach the vendors ahead of time so they don’t bring lame marketing presentations to real technical guys and gals.  The stream will definitely be worth your time!

Are there vendors making something awesome you’d like to see present at an event down the road?  Nominate them here!

Do you love technology, and work for a non-IT vendor?  If you’d like to become a delegate, find out how here!

 

 

In the interest of full disclosure, delegates’ travel expenses to and from the event, as well as accommodations during the event are covered by sponsors.  As with any tech event, delegates may receive swag from vendors, but delegates are not under any obligation to blog, tweet, or even like the products.  Of course if there are cool products that interest delegates, they may be discussed on various social media sites, but there is no compensation, or expectation from either side after the event concludes.

labs

Introduction

VMware made an exciting announcement at VMworld 2011 that didn’t get much press or attention.  The VMworld labs were slated to be released for customers interested in doing technology previews of our software solutions in early 2012.  Notice I didn’t use the term “Proof of Concept” as this implies different things to different people.  Proof of concept could have business requirements, technical requirements, or users  that are associated to  your specific environment.  I am happy to report that the “VMware Virtual Customer Labs” (vCL) are now available for **selected customers.  I wanted to do a write-up about the vCL, what it is, and how it works as I think this is a unique offering that VMware is providing it’s customers.

 

vcllogo

What is the vCL?

The vCL is based off VMware vSphere 5, VMware vCloud Director 1.5 along with vCenter Orchestrator for automation.  This is something that VMware has been using internally for years called the “vSEL” or the VMware SE Labs.  vCL is designed to be a fully automated cloud solution where users can checkout VMware software solutions for 14 days of testing and training/education.  The vCL was built around the concepts of saving customers time (manual installs, deployments, infrastructure configuration) and hardware costs as VMware hosts the environment on behalf of our customers.

 

The Workflow Automation

Automation is part of any cloud solution, if you stop to think about it your really getting a demonstration of vCloud Director along with any of the other labs you check out!  Let’s kickoff the backend automation once a customer requests access to a lab environment.  In this example I am the customer and I am interested in selecting the SRM 5 environment to test out.  As a VMware systems engineer, I login (approval phase) and submit the request to the vCL system.

 

vSEL

Below are the vCL options  that I am going to configure for the customer, this includes the customer name, which lab they are interested in and basic information like an e-mail address.  In this example I am using myself as the customer name to show some of this functionality.

 

vCL Deploy

Once I submit my request, I get an automated e-mail (below) indicating that my request has been accepted and the build process has been initiated.  As you can see this might take slightly longer than normal as we are delivering full cloned vApps to ensure performance and a great user experience.

 

email1

Once my environment has completed it’s provisioning process, the customer along with the VMware engineer get an e-mail confirming the build is complete.  The e-mail contains the URL for accessing the environment, along with the custom username and password for authentication purposes.

email02

Here comes the exciting part, let’s login!  Here is the main splash screen where I authenticate with my credentials I received in the previous step.  Note you need to accept the VMware EULA to access the environment or you will not be able to login and gain access.

 

vcl_login

I now have complete access to my personalized demo SRM environment where I can now begin testing SRM 5.0!  As I mentioned earlier, I get 2 weeks to walk-through the lab and complete any testing I would like to perform.  The lab manuals will be provided by the systems engineer that you work with when you request your access to the environment.

 

vcl_vcenter

 

A Special Thanks!

I wanted to give special thanks and some recognition to the vCL team for all of their hard work and efforts that went into this project.  It is still a work in progress, but the team is in the process of adding more labs to the service catalog.  They are also planning on adding more back-end storage to accommodate supporting more customers and ensuring scalability from a performance perspective.  Great  work guys!

 

Note:

** Selected Customer indicates those that are supported by a pre-sales systems engineer.  The SE is the owner of the customer experience and is responsible for coordinating the customer requests and ensuring they are getting the desired results from the vCL.

If you have HP BL 460 G7′s with the on-board 10GB CNA, you’re going to want to read this post regarding a problem with the latest firmware.

This was first noticed this issue when updating firmware to troubleshoot an issue where the storage doesn’t come back up after rebooting an upstream Nexus switch.

The symptoms are: the NIC comes back up, and the vfc is up, but all storage paths on that side of the fabric are still dead in ESXi 5.0.  To fix this issue, the vfc or port channel  must be shut /no shut.

I also saw an issue where the storage paths were dead, and the NIC never came back up.  A reset of the Ethernet port will not fix this.  A reboot of the ESXi host is required.  Pay attention to the NIC state if you lose storage paths in this configuration with FCoE.

As part of my troubleshooting, I went to update the firmware on the CNA.  The latest version of the firmware from HP is 4.0.360.15a.  When updating using the Emulex utility, on about 20% of my blades, I got a CRC error during the upgrade process.  Below is a screenshot of this error.

 
After retrying the firmware update, as stated in the utility, the same error occurred.
This is where you need to pay attention!! 

During the POST process, the blade WILL report the correct firmware.

 

Since the firmware version is correct, one might assume the update was indeed successful.  That’s a bad assumption.  Upon further testing, we found the blades that failed the firmware update were the ones failing during the switch reloads.

There were only 2 blades that did NOT fail the firmware update, but still failed the switch reload process.  They were replaced, and now I have no blades failing to reacquire storage paths after an upstream switch failure.

I must point out that HP has been unusually proactive with this issue, which is a nice change!  I still have several blades in another datacenter that are not taking the firmware update.  When I scheduled to have those all replaced, HP got some of their top people on it and scheduled a call.  I tested their proposed fix this morning, which didn’t work.

They are actively working on a fix, so you won’t have to replace your blades.  I will update this post as soon as I get word back from them on that fix.  Meanwhile, if you’ve seen this, you might want to schedule some switch reloads during a maintenance window to make sure you are good to go.

Update 2/6:

As of today, there is no fix that I’m aware of. . . HP replaced the remaining blades after we tried a couple more proposed fixes.  If I get word of a fix, I will post it here.