** I have modified this post to include the updated licensing changes announced 8-3-2011, as well as the link to Alan’s updated PowerShell script.  The video clip has not been updated, the concept is the same only the number have changed.  Enjoy!

 

***Disclaimer – I am a VMware employee and receive paychecks that have the word VMware stamped on them.  My thoughts are my own but if you are afraid of an employee’s opinion, run away now!

 

Introduction

A lot of big announcements have taken place over the past week from VMware.  We had our largest launch event in history and announced to the world that our flagship product “vSphere 5“ is being released.  We also announced many other product updates and releases like SRM 5, vShield 5,  vCD 1.5 and an updated release of heartbeat for Virtual Center.  Loads and loads of exciting new features and functionality to allow you to run your mission critical workloads on the worlds #1 hypervisor.

I wanted to do a factual write-up on what has changed, hopefully explain it better since there seems to be a lot of misconceptions, and also give you my take on what I am seeing from my customer base.  For those of you that don’t know my background, I came from a customer environment where I designed and implemented VMware in a large scale deployment.  I then decided to convert to the dark side and go work for a vendor (VMware) so I actually have some perspective to offer from both sides of the fence.  Now that we are one week out from the new licensing change, I thought I would share some thoughts.

 

What is vRAM?

One of the changes that came along with a lot of exciting new features was the new vRAM licensing model.  VMware has decided to move away form the core based model to a “consumed virtual ram” model across the entire environment.  For those of you that are unsure of how the current vSphere 4.x licensing model works today, I have embedded a chart here to help you understand the core limitations as well as the features and functionality between the versions.

 

versions

 

Below is the new vSphere 5 pricing comparison which also includes a feature/functionality breakout.  As you can see it lists the new vRAM entitlements and how much pooled vRAM you are allotted  per socket.  Notice we have removed the core limitations around the physical processors, and lifted the memory limitation from a physical perspective.

 

image

Your current licenses will be converted to the correct vRAM allocation model depending on the version of vSphere that you are paying maintenance on.   VMware has provided a power shell script to assist you with determining the amount of vRAM you have available to you when you decide to upgrade your environment.  My suggestion is to download the power shell script from VMware’s Alan Renouf that will automatically calculate this information for you.

I decided to put together a quick video link that discusses vRAM in a little more detail, and gives you an example of what this pooled model looks like across a 3 node cluster.  Watch the video below:

 

 

My thoughts

There has been a lot of emotional responses around this topic over the past week, which is understandable.  VMware has the best user community of any software company I have ever seen.  There are several reasons for this in my opinion.  VMware has made great impacts on our customer environments from a datacenter consolidation perspective.  We have allowed our customers to run more efficiently, do more with less, and enabled them to become hero’s for their internal customers by offering them agility.  We have also given them a portion of their lives back from an administrative perspective. 

I was one of these end users that came in to fix systems at all hours of the night when the hardware went south.  I was responsible for deploying hundreds and hundreds of physical servers that eventually consumed all of my time from a maintenance perspective.  VMware technology gave me my personal life back, and allowed me to start doing mundane maintenance tasks during the day!  Storage vMotion allowed me to retire 3 older storage array’s (30+ TB) that were at the end of life, and move them to newer technology with no downtime,  during the middle of the day!  I believe there is a personal component to all of this, and part of this is what makes the community such a strong force.

The goal of this article is to educate people, and give you my perspective.  It is not intended to defend VMware’s position or try and convince you why vRAM should make you sleep easier at night.  VMware has spent the past two years working internally and externally with customers to try and determine a fair licensing model that works for everyone.  The current model will not scale with the quickly changing hardware landscape driving core counts exponentially.  A handful of my customers are already having to double up on licenses as they are at vSphere Enterprise and are being restricted on the number of core’s.  I think the model is a fair one and as you walk through it hopefully you will see the logic behind why VMware had to make this change, to better support our users and the community.

vmworld-2011-hero

Over the past few VMWorld events, I have sent a list of speaker recommendations to my customers outlining some of my personal picks for the sessions I think would be the best to attend based on either my direct experience with the speaker, through personal relationship, or presenting with them at another event. I have combed through all of the speaker sessions, and chosen only the ones that I know (IMHO) who are “golden” in technical knowledge or presentation skills! Try not to miss the ones marked “Highly Recommended” as they are sessions that have an extraordinary speaker or topic.

Remember, the standard disclaimer applies: These are my own picks (not those of VMware, and not meant to be exclusive of the many other talented presenters and contributors) so your mileage may vary! BTW, stop by the Hands-On Lab at both Vegas and Copenhagen this year to say hi, since that is where I will be living with my other Lab Captains!

http://www.virtualinsanity.com/wp-content/uploads/2011/07/VMworld-2011-Speaker-Recommendations-–-Toms-Picks.pdf

 

Recently I have been researching HP C7000 chassis connectivity options extensively.  Prior to diving deep into it, Virtual Connect FlexFabric seemed like a no brainer.  On the surface, it has many advantages.

The cabling / port reduction is an obvious win, as is the ability to have some control over WWID and MAC assignment to blades. Moving East / West traffic between chassis without having to go Northbound to a ToR or EoR switch is attractive as well.  Of course these are all things that are just standard with UCS, but I digress.

After many meetings with HP, I still had some questions that were unanswered.  I turned to the many thousands of pages of HP documentation on the subject.  Sifting through all the “cookbooks” and the in-depth guides to Virtual Connect, and talking with some current users of FlexFabric, I came to the conclusion that it is missing some key features that are needed in a VMware environment.  In fact, I would say that for Cisco shops running VMware, HP FlexFabric makes little sense.

The biggest problem I have with Virtual Connect FlexFabric is the lack of any real QoS.  Once traffic enters the Virtual Connect module, it’s anarchy.  There are no controls in there for prioritization or control of bandwidth.  In a VMware environment, where there will be multiple types of traffic, each capable of generating significant load, the only control you have on VC is egress rate limiting.

It’s akin to limiting the number of people one can put in a single car, right before driving through the middle of Rome.

For those who haven’t had that experience, trust me, it’s the same type of anarchy that occurs inside VC.  The only rule is try not to die.

 

 

Here’s a nice diagram showing Virtual Connect and VMware traffic flow design from M. Sean McGee’s blog:

When you have a Cisco 1000V on the ESXi host and a Nexus 5K on the other end, it makes little sense, in my opinion, to completely break awesome features like Priority Flow Control and Bandwidth Management.  HP states that they do support FCoE and DCB (CEE), which should include the above features, but their own guys cannot really say how one would configure, or troubleshoot it.  That’s part of the problem.  VC is a black box that abstracts your ability to see what is going on inside.

One of my other negatives for VC FlexFabric is that I have no choice but to split my 10GbE pipe into smaller pipes if I want to run an HBA off the adapter.  If I use the exact same onboard CNA without FlexFabric, I don’t have to do that.  This can be solved with separate HBA’s, or 10Gb NIC’s, but that negates the alleged cost savings.  So now I’m forced to try and guess how much bandwidth I need for each traffic class, when I already own switching infrastructure that is smart enough to do that for me.

In my opinion, this is akin to disabling DRS.  DRS is smarter than you, and faster.  Why would anyone disable it?  Cisco QoS is certainly smarter than me, as is VMware NetIOC.  So why would I want to throw some arbitrary limits on my huge pipe?  VMware admins understand that shares are better than reservations or limits.  The reasoning is the same on the networking side.

There are other problems I see with this solution, but I don’t want to bore you.  One complaint I have heard from close associates is the HP recommended method of “stacking” VC modules is problematic.  Not only do you have to give up 3 of the 8 ports per module for stacking, but it can create bandwidth issues as well.  Recently, a friend of mine had to completely revamp his setup to uplink everything, as opposed to stacking, which was allegedly causing bandwidth problems in his environment.  Ohh, and in addition to all this, the FlexFabric module will take FCoE and pass it North as standard Ethernet.  So you lose any of the FCoE features provided by your Nexus switch.

Companies that are not virtualizing certain applications, but will run them on blades, may find that the advantages of moving around MAC and WWID’s outweigh the potential disadvantages of FlexFabric.  Everything on my blades will be ESXi, so I don’t really have a need for quick physical ID recovery.

As of right now, I plan to use passthrough modules on the C7000’s.  At least until a better alternative comes out.  Passthrough is slightly more expensive on the uplink port side, but it doesn’t prevent my networking team from having end to end visibility and management.  And that takes some of the guesswork, and the administration off of my team, which is a good thing!  I would be interested to hear your experiences in the comments below.

 

I came across this tip from a fellow colleague today and wanted to share it with everyone. You can run Zimbra Desktop in your default web browser. My default web browser is Chrome and I have found running Zimbra Desktop in Chrome to be very responsive.

 

First step is to open the native Zimbra Desktop Client and then click Setup in the upper right corner.

zd_1

 

This will open the setup screen for the Zimbra Desktop Client. Located in the bottom right you will see an option for open in web browser; click this link.

 

zd_2

 

This will open Zimbra Desktop in your default web browser, once it opens click Launch Desktop.

 

zd_3

 

And there you go, the Zimbra Desktop Client running in a web browser, in this case Chrome.

zd_4

 

A couple notes:

The native Zimbra Desktop Client must remain open, so I just minimize the native Zimbra Desktop Client.

You have to “launch in a web browser” each time you close your web browser or the native Zimbra Desktop Client.

Charge-back, Show-Back, Shmargeback, call it what you will but get off your duff and do it.  As I travel around and work with customers on building out their Private/Hybrid cloud strategies I’m amazed at how few organizations actually have a clue about what it costs to deliver IT services.  Sure, the CFO could look at his budget and say, “Mmmm, yup you guys cost me X.”  Ok, but what are you actually delivering for X??

Can you clearly articulate, “I deliver this, this, and that at these service levels that backstop and deliver this revenue for the business”?

Increasingly IT is being questioned:  “What do you actually provide me?” and “Is there another way I could do it more efficiently (cheaper in CFO speak)?”  VMware’s Paul Maritz like to point out that for the first time in IT’s recent history, corporate IT now has an external (competitive) rate card against what their services can be compared.   It’s easy today for a Line of Business to go to Amazon, Rackspace, you-name-it, and simply procure IT services.  Sure, it’s fraught with issues  and considerations (that the business user won’t consider) but at the end of the day, it’s cheap, easy and moves at their pace — NOW!

I firmly believe corporate IT still can provide HUGE value to the business, but to maintain relevance it needs to dramatically change.  You MUST be able to articulate what it costs to deliver a given service.  You MUST be able to differentiate your services and the value you provide to the business in terms they understand and care about.  The fundamental basis for doing so is measurement.  So what should you do?

  • Examine yourselves.
  • Procure the tools needed to start metering your services.
  • Deploy those tools
    • It’s amusing how often people buy capabilities yet never deploy them….usually because they are too busy or lack the skills in-house.
  • Differentiate your services, for example:
    • Compute and Storage Performance tiers
    • RTO/RPO tiers
  • Start providing your LOB’s  with metrics of what they are actually consuming and the associated costs for those services (showback, baby!)
    • If you don’t do Chargeback today this will start to condition the business to see your services defined in these terms, all-the-while setting the stage for moving to a chargeback model.
    • This will also give you an internal scorecard by which you can measure yourself against those external providers.  Believe me, if you are not already, your internal customers are.
  • For new services/requests, begin engaging in meaningful business-based conversations about what is actually needed for a given service.  Show the consumer the costs associated with the various tiers of service (“NO, DR isn’t free”, it costs $20/month/VM”, for example).

At the end of the day it’s economics.  When we as IT service providers can define various levels of service at graduating degrees of cost, the business will decide what they are willing to pay for based upon their requirements.  Furthermore, you will be able to truly measure yourself against external providers and clearly articulate your value-add.

Without it, your days are numbered.

This aint’ your Daddy’s Oldsmobile”

The CCNA Exam is much more difficult than the VMware VCP Certification.  There are a number of reasons why this is the case.  Mainly the format of the exam goes against the way that I’ve studied for other tests.  The content is also very broad for an ‘entry level’ certification.  I compared the difficulty of content and interaction to the EMC Proven Professional Specialist exams or possibly to the VCAP-DCD.  Consider this exam as a mid-level certification (AKA Don’t overestimate its difficulty) and you will start off in a better place in your preparation.

 

To study for this exam will interrupt some part of your life.  To get the ‘speed and accuracy’ required to get through the questions and not make mistakes, was to learn the content first and then learn how the content applied in the test.   I felt like I knew the content and went to the practice exams and found I missed one of 2-3 correct choices more often than I expected.  Effectively, I was aware of the content but hadn’t studied it.  I made more than 2 runs at the exam and came out with my tail between my legs.  I was confident; but wrong.  I was discouraged.

Red Pill or Blue Pills — Pick your Path

There are two ways you can get to the CCNA certification.   You can take the ICND1 exam 640-822 ICND1 followed by the ICND2 640-816 ICND2 exam or you can combine the two in a single exam 640-802 CCNA

I selected the combined exam because it matched my training class content, but I entered the decision blind.  The CCNA 802 exam is much more intense, on fewer questions, that have to be completed in 90 minutes.   No rest for the wicked.  This is the path we will discuss in more detail; let’s see where the rabbit hole goes.

GET AWARE!

Read / Review the material and then study for 1-2  hours a day.  I had the luxury of a week long bootcamp class with an instructor, but work had a few interruptions during the IP portion of the training.  I thought I could miss that part, I already knew IP, but I was wrong.

Subnetting is very important– Relearn IP Subnetting.  I doubt that you are a wizard and can see the subnets instantly in a scenario .  This is where I underestimated the material. – See  http://subnettingquestions.com for practice exercises and see where you might fall in your understanding.  Then watch this video of an important step to learn about IP subnetting:

PITStop – Mental Subnet Calculator

http://www.cisco.com/web/learning/le31/le46/cln/clp/fastlane/Subnet_Calculator/index2.htm

It’s a little quirky but important for the exam portion below.

That aside, I found that I couldn’t take the exam right after the week training.  Nor would I suggest that you should.  The need to run the content through lab scenarios for the simulator questions is something the week long training didn’t cover well.  I needed to have good content to review beyond the classroom books I took home, which were mostly in slide-deck format.

Use Cisco’s Press CCNA Official Exam Certification Library which has the ICND1 and ICND2 books by Wendell Odom.  These can be found at Amazon.com http://www.amazon.com/Official-Certification-Library-640-802 or your local bookstore.   This book is great because the content comes in all the formats and questions you will see in the exams.  Not the exact questions, but the types of questions you need to wire your brain into studying in the next part below.

I found the following from the Cisco Learning Network to describe the content of the ICND1 vs ICND2 topics I would expect to see on the exam:

ICND1
ICND2

 

Now I found that I could pick up a specific chapter every night after my work/family time calmed down in the evening and get a decent hour of focus on the topics.  Mileage will vary, but having done this for a few weeks, in small chunks, helped when I moved over to the Study portion below.  I didn’t have to argue if  a trick question being presented. I knew that it was, an moved past to the correct answers.

NOW STUDY!

Practice extensively on Simulators and Practice Tests for 3-4 days just before taking the exam.  If you have a Cisco device on contract, you can build some part of the environment with GNS3 and follow the guide from http://freeccnaworkbook.com with the binaries of the device.

http://packetlife.net/lab/ and Cisco Packet Tracer (Cisco Academy Members) are two other tools that are good for use as practice for the lab questions. The difficulty here is building an environment that you can practice on without already knowing what is in the scenario. A new question will present a lab you have never seen and you will have to work through the unknown to find the right answers.

A huge find from the ICND1/ICND2 Cisco Press books was the Boson NetSim . It is found on the last CD of the ICND2 book.  It allows you to run a full simulation of the Composite Exam with a time limit and the opportunity to watch your progress and get answers on each question.  You can also just run without any hints and see how you do.

This sim made all the difference to me. It made the content I learned in the first part apply to the test scores.  I hadn’t put myself in the right frame of mind to take the exam and succeed until I tried this simulator.  TRY THIS SIM BEFORE ALL

 

EXAM TIME

Before the test begins you are provided with a sheet of paper – do a “brain dump” of any items like the PIT Stop calculator learned above. You can do this while the exam environment runs a demo in the beginning of your test time.  Don’t worry it doesn’t count as part of your exam time.  Cisco has a quick survey and this tutorial at the beginning of the test but not counted towards your time.

Example of Exam Environment

Exam Interface Tutorial

Other things to remember:

  • Every exam center will generate the questions in a random order from a different pool
  • Simulators may be at the beginning or the end.
  • Once you answer a question you cannot go back to it.
  • Do not spend a lot of time (5+ mins) on a single question take the hit and move on. ‘Tis better to guess and miss than miss all the questions. The simulators may be the last question when you are short on time.
  • Some CLI “help commands” can be displayed on some of the simulator questions.
  • Hitting tab twice will display the available commands.
  • Hover over the host and switches on any Simulator Topology Map – some equipment can be accessed  for testing of traffic patterns
  • Sometimes a question might trigger your memory – write down any of these “triggers” on the paper provided for future questions

Success

Remember it is a marathon more than a sprint.  Cisco has done a good job of creating a challenging certification exam.  They are good at it.  Do not get discouraged.  I think the first time pass is the exception to the rule on this exam.  You will probably need two times at this one.  Get prepared and do not underestimate the questions.  Use this path and these tools first, and I’m sure you will come out of the experience with better knowledge of the content than breezing through it on the first try.  I know I did.

 

 

 

Anyone wanting to talk about VLSM subnetting and the perils that are caused by a distance vector routing protocols in the present IPV4 versus the upcoming IPV6 orientation of Dual-Stack firewalls and Teredo Tunnels over Frame-Relay;  I will be on twitter on any given #Beerfriday

Cisco decided to shut down Flip last month. Why? Because it’s a low margin business that Cisco has no business owning. There is talk about killing Linksys, or spinning it out. Why? Low margins and it doesn’t jive with Cisco’s core competency. UCS (datacenter unified computing system) is another product that has very low margins, and really should be sold if Cisco is to remain as strong as it has been over the past two decades.

 

I find it interesting that only a year ago, all the industry pundits were talking up Cisco and their stock was riding high. How quickly the sands have shifted under their feet. Shareholders and industry experts are calling for Chambers to resign, and some have even suggested they get rid of UCS. Last week’s Infosmack featured some interesting commentary on Cisco selling UCS. GigaOm thinks Cisco has lost that lovin’ feeling for VCE. They seem to be investing as heavily as EMC, but they get a much, much smaller piece of the pie on all those sales. And let’s face it, VCE sales are expensive. Maybe Cisco should have bought EMC when they had the chance?

 

I thought Robin Harris’ comment over on Storagemojo was profound:

UCS lowers Cisco’s margins; enrages large resellers; and has no sustainable competitive advantage. Cisco can’t wish those facts away, and the stock market won’t forget them either.

 

The sustainable competitive advantage thing is a big one.

 

Even with the latest IDC report showing that UCS has overtaken Dell to become the #3 blade player, there is still plenty of uncertainty in the market. I can say from my own experience that executives, who admittedly know very little about UCS and what it brings to the table, are shying away from it out of fear that Cisco could exit the server business.

 

From the very beginning, there was talk of Cisco not being “serious” about becoming a server vendor. Add the recent stock troubles, and decision makers are less willing to stick their necks out on millions worth of UCS. After all, nobody has ever been fired for buying IBM.

 

Companies often take a bath when they get into areas that go against their long standing value propositions. BMW lost $4 Billion when it sold Rover to Ford for $1. Cisco spent $600 Million on Flip only 2 years ago. The fact that Cisco first approached IBM and HP with the UCS idea, and was rejected only proves that Cisco knew it didn’t want to be in the server business before it . . .got into the server business. Perhaps now that they have made their point, one of the server vendors will be interested in a UCS purchase.

 

With HP getting amazingly aggressive on pricing of their network offerings, and Juniper introducing QFabric, Cisco’s attention needs to be focused on their core competency if they wish to maintain those luxuriously high margins into the future.

writing

 

Download The Newsletter VMware Newsletter May 2011

Hello, it looks like from the downloads we are tracking, the newsletter is around to stay for a while.  If you have any suggestion or would like to see the format change, let me know!

Ever since the acquisition of springsource nearly two years ago, VMware has been generating a lot of excitement in the application development space.  That excitement was kicked into high gear a few weeks ago when VMware announced the industry’s first implementation of open PaaS, CLOUD FOUNDRY.

But I have a feeling much of that excitement is not felt or even understood by the average reader of this blog.  The reason largely has to do with the fact that most of us have an IT infrastructure/operations background.  We are really good at troubleshooting low-level infrastructure stuff, we can rattle off the differences between RAID5 and RAID10, and we can debate iSCSI vs NFS until we are blue in the face.  However, while we may able to go crazy, Einstein deep into infrastructure technologies, there are very few us who would have a single clue about things like MVC software architecture, Object/Relational Mapping, or Dependency Injection.

 

Sure, some of us (and probably not many of us) may have the ability to create useful automation scripts in PowerShell or PERL, but that’s a far cry from being able to create a full-blown application for end user consumption.   And I’m here to tell you the application development world is, now more than ever, something we all need to embrace.  Because worlds are colliding and CLOUD FOUNDRY is a glimpse of things to come.

 

What is CLOUD FOUNDRY?

Well you already know that Cloud Foundry is a PaaS, which means that at a very high level, you can think of Cloud Foundry as something on-par with Microsoft’s Azure, or Google’s AppEngine, or Salesforce’s Force.com, or Engine Yard.  Not familiar with those services?  Or not 100% clear on what a PaaS is?  OK, then for now, let’s think of Cloud Foundry as a Hypervisor for cloud based applications.  To be clear, I am NOT saying Cloud Foundry is a Hypervisor (because it is not); but let’s just start there.

 

So today, what do we do when we want to deploy an application in our virtual datacenters?  First, we start with a VM or a collection of VMs, and we either deploy them from a template, or we start from scratch and install an Operating System.  Then, after some routine IT processes (patching, updating, configuration management, etc.) we either install and configure the application, or we hand it off to an application team to do the rest.  The key point I want to make here is you start with an Operating System and build up from there.  Meaning, the primary point of abstraction, the place upon which we begin to start build, is the Hypervisor.

 

How does this translate to Cloud Foundry?  Well, Cloud Foundry allows us to start building applications directly on Cloud Foundry.  There is no need to install an Operating System, nor is there a need to patch it, apply configurations, and install application components.  That’s all taken care of behind the scenes.  So Cloud Foundry becomes the main point of abstraction, the place upon which we directly build our new cloudy applications.

 

Another way to look at it would be, the Hypervisor switches our focus from managing hardware to managing VMs.  Similarly, Cloud Foundry switches our focus from managing VMs to managing applications.  In the former case, the hardware doesn’t go away and in the latter case, the VMs won’t go away either.  But the way we interact with, manage and even think about hardware has fundamentally changed … and so it will be with VMs and Cloud Foundry.

 

Again, as a point of clarification, is Cloud Foundry the textbook definition of a Hypervisor?  Nope.  But if we allow ourselves to loosely define a Hypervisor as the point of abstraction between layers of the compute stack (Hardware – Hypervisor – Operating System – Hypervisor – Application), then Cloud Foundry certainly fits the bill.

 

How is CLOUD FOUNDRY different from other PaaS offerings?

Now that we understand a bit about what Cloud Foundry is, I’m sure you’re wondering what makes Cloud Foundry any different than the other PaaS offerings out there.  The biggest differentiator can be summed up with one word:  choice.

Prior to Cloud Foundry, PaaS meant limited choices and ultimately PaaS meant vendor lock-in.   Writing an application for Microsoft’s Azure, as an example, means you will only be able to run your application on Azure.  I suppose that’s not a big deal if you’re 100% committed to Microsoft’s Azure solution and you’re OK with an off premise only option (Azure is not available for on premise consumption).  So there is definitely an element of vendor lock-in there.  And this is true for any PaaS offering out there today.  Whichever PaaS you go with, you are either limited in terms of the developer frameworks and application services the PaaS makes available to you, or you are limited in your deployment options (i.e. public vs. behind-my-firewall), or both.  For the customers I talk to, this is a very big deal.

But the good news is Cloud Foundry brings a big change to all of this.  Cloud Foundry has been designed eliminate vendor lock-in by offering:

  • choice of developer frameworks
  • choice of application services, and
  • choice of deployment (internal vs. external cloud).

Of course you might be thinking, “that sounds great, but ultimately we’ll have to run Cloud Foundry on top of vSphere, so we’ll still be locked in to VMware.”  Well, you would be wrong.  Yes, Cloud Foundry does run on vSphere, but it can run on non-VMware Infrastructure clouds as well.

 

Choice.  It’s super attractive, and it makes Cloud Foundry unique.

 

Why should you care?

But for you, the reader of this blog – someone who is probably focused on IT operations, not software development – why should you care?  Here’s one great reason …

If your IT shop does not like dumping the company Web app on Engine Yard but the dev team is threatening mutiny over working in a stone-age traditional Java production lifecycle (“that’s so 2005, man”), Cloud Foundry can basically become the in-house option.

– Carl Brooks, Senior Technology Writer for SearchCloudComputing.com

Another reason?  Whether we like it or not, PaaS is coming like a freight train and we need to get in front of it now. We need to embrace it.  We need to be the first to understand the Hypervisor 2.0, and all the moving parts around it.   We need to figure out how to offer it to our internal developers before they go consume it externally on their own.  Because ultimately, we will either add value to our employer and serve our users, or someone else will.  I know I’m not going to get run over by the train … are you?

 

What should you do next?

Here is my recommendation … be the first person in your organization to embrace and understand Cloud Foundry.  Believe me when I tell you this will pay handsome dividends for you far into the future.  To start you on your journey, here is some recommended reading …

 


 

VMware published KB Article 1037959 ( http://kb.vmware.com/kb/1037959 ) on April 18, 2011 in an effort to clarify VMware’s position on running Microsoft Clustering technologies on vSphere. Below is a snapshot of the support matrix published by VMware in the KB (always refer to KB 1037959 for the most current information).

vmw_mscs_graph

 

For those familiar with VMware’s previous position on Microsoft Clustering, you will notice a couple changes. First, VMware has made a distinction in Microsoft Clustering technologies by segmenting them into Shared disk and Non-shared Disk.

  • Shared Disk – solution in which the the data resides on the same disks and the VMs share the disks (think MSCS)
  • Non-shared Disk – solution in which the data resides on different disks and uses a replication technology to keep the data in sync (think Exchange 2007 CCR / 2010 DAG).

Next, VMware has extended support for Microsoft Clustering to include In-Guest iSCSI for MSCS.

For those interested in leveraging Microsoft SQL Mirroring, the KB states that VMware does not consider Microsoft SQL Mirroring a clustering solution and will fully support Microsoft SQL Mirroring on vSphere.

Under the Disk Configurations section of the KB, the KB discusses how if using VMFS, the virtual disks used as shared storage for clustered virtual machines must reside on VMFS datastores and must be created using the eagerzeroedthick option. The KB provides detail on how to create the eagerzeroedthick disks for both ESX and ESXi via command line or GUI.  Additional information regarding eagerzeroedthick can be found in KB article 1011170 (http://kb.vmware.com/kb/1011170). Something to note in KB 1011170, at the bottom of the article it states using the vmkfstools –k command you can convert a preallocated (eagerzeroed) virtual disk to eagerzeroedthick and maintain any existing data. Note, the VM must be powered off for this action.

In closing, the VMware support statement exists to explicitly define what VMware will and will not support. It is very important for you to remember these support statements do not make any determination (either directly or indirectly) about what the software ISV (Independent Software Vendor) will and will not support.  So be sure to review the official support statements from your ISV and carefully choose the configuration that makes sense for your organization and will be supported by each vendor.

 

I am sure it comes as no surprise to any of our readers that virtualization is not the exclusive full-time focus for most of us.  Most of us have a breadth of responsibility spanning gobs of infrastructure layers in our respective organizations.  One common pain point that most of us have is backups.

For many companies, backup is an afterthought.  It doesn’t contribute to the profitability of the company.  It doesn’t help you make more widgets in the same amount of time.  The result often times is a neglected backup system when it comes to budgets and spending.  Most of the time, even though we know the importance of backups, we’re okay with it taking a back seat.  After all, who wants to goof around with tape drives when there are cool new blades and SSD storage to play with?

It was this frame of mind that I found myself in on Tuesday of this week.  I had signed up for W. Curtis Preston’s Backup Central Live a while back on Stephen Foskett’s recommendation.  I knew it would be decent, as I had used Backupcentral.com for a long time as a valuable resource to help deal with those dreaded backup problems.  But when Monday came, I found myself wondering why the heck I signed up for this seminar.  I had so much work to do this week, and most of it was fun SAN and VMware planning and design stuff.  I didn’t have time for baaaaackups. . . Grrrr.

In the end, my boss was pumped about the seminar.  I knew I couldn’t back out without getting grief, so reluctantly, I made the 1.5 hour drive to Cary, NC for a full day of backups.  I knew Curtis would be a great speaker, and have good insight.  I have heard him many times on Infosmack, and I know from his blog posts that he knows his stuff.  I just wasn’t looking forward to a full day of vendor pitches between the valuable information.

Ultimately, I was impressed with the event, and it was far from a waste of time.  Even the vendor presentations were decent, and they kept to a reasonable time limit, so the pace was perfect.  I’ll give you a quick rundown of what I learned at this event.

Often times we feel alone in our backup struggles.  At the seminar, there was wireless polling during the presentation, so we had real time answers to our questions.  That alone was a fantastic change; and I prefer this to raising my hand 48 times during a session.  From this polling data, I learned that I am not alone.  Many share in my misery.

  • 49% of attendees still do backups DIRECT TO TAPE.

So while us 49% think that no one hears our screams, at least now we know that we’re not the only ones screaming.  I think we all know that tape is not a suitable target for server backups.  The problem only gets worse as tape drives get faster.  Disk, at least as a staging area, is a necessity now for reliable backup to tape.

That said, Preston points out that tape is a long way from being displaced from the datacenter.  Tape is still 50x cheaper than disk, and more reliable for long-term data storage.  One fact I found enlightening was that hard disks are not designed or tested to store data long term while powered off.  This is something I had never thought about, and only a couple of companies, like ProStor, are trying to solve this problem.  Even if we solve for the reliability difference, it will likely be decades before we see a significant degree of cost parity (if ever).

A speaker from Cambridge Computer Services talked about new cool ways people are using tape as part of a tiered strategy for primary data.  Some are even using tape as a mirror for their primary storage.  Of course this requires a gateway appliance with plenty of cache, and good software, but the savings are real.

Another crucial area we touched on was that of archival, especially as it relates to electronic discovery (ED).  Almost NO ONE is doing this.  The vast majority is using their primary backup software and methodology for archival.  This is an expensive mistake if you ever are called upon to do discovery.  In addition to my own experience with ED, Preston tells a story of a client who spent millions to satisfy a single discovery request.

Apparently a single user’s e-mail for the past three years was requested.  As they were only doing normal Exchange backups, that meant restoring 156 different monthly Exchange backups, and then fishing for this guy’s mails.  It took an army of consultants working three shifts MONTHS to do this.  Since we live in a litigious world these days, it might be a good idea to get your ED and archival in order.  One product that was recommended at the seminar was Index Engines.  I haven’t had time to look at it yet, but it sounds brilliant!

One interesting statistic we saw in the polling data was that the majority of attendees had an overblown opinion of themselves when it comes to their own backup environments.  The majority said their backups ran well.  Preston’s experience tells quite a different story.  The scary part of this is that people don’t know that their backups suck.  They find out when it’s too late.

The most valuable part of this seminar was the discussion time at the end.  There was many interesting discussion around cloud backups, AWS outage, and snapshots.  This brought everything together that we had learned during the day.

There isn’t space in a single blog post to cover all the material from a full day seminar, but I hope I’ve given you enough to help make a decision to check this event out when it comes your way.  I have to give it to the Backup Central Live crew for taking a topic that most people hate, and turning it into a valuable day of learning.

writing

Download The Newsletter VMware Newsletter April 2011

Welcome back, I hope you found our first newsletter helpful in some way shape or form.  The newsletter seems to be getting larger and larger which is a great thing.  It might soon start to qualify as a magazine rather than a newsletter.

We got some good feedback so we are going to keep going with this for a while.  Please let us know via the comments section if you are enjoying it, would like to see different content,  or just want to say hello.

-Scott

When talking about VMware virtualization bottlenecks, 9 out of 10 customers answer their number one bottleneck is memory. Notice how I said bottleneck, not problem. This relates to capacity planning or trying to understand and right size the environment so you can gauge when you need to order more physical infrastructure. Their number one problem is storage, which is quite a different story altogether and I won’t be covering storage in this article (this time). Since memory is such a common point of discussion with my customers, I thought I would dig a little deeper on this topic and share some information around utilization and what it all means.

 

My customers typically track their utilization in the most common area of vSphere that one might expect to find this information, the DRS Resource Distribution graph at the cluster level.

 

clip_image001

From the image displayed above, one might think that I am close to memory capacity and I should look at ordering more hardware for my cluster. While in a general sense that might not be a bad idea to begin planning for growth, but let’s take a closer look at what we are seeing.  Notice the blue informational icon and how it’s telling us that the displayed information is based on memory consumption. Let’s do a mouse over on the chart that’s being displayed to get some more granular information and what this means.

 

clip_image002

 

You can see in the above image that my Virtual Center VM is “Consuming” ~4GB of memory, but in all reality the active memory being used is sitting at ~700MB. DRS entitlement is a measurement that calculates what the load or demand is on the vSphere host/cluster over time, and then projects an average entitlement number for planning purposes. You can use the DRS entitlement numbers as a general planning/forecasting number, but to be honest you still have some capacity within the cluster.

Now I wouldn’t be doing my job if I didn’t make you aware of an easier way to track this information by using software rather than brain power. For those of you that haven’t seen Capacity IQ yet, I would highly encourage you to evaluate the product. Capacity IQ was built for this specific reason, to help you understand when you will need to start thinking about more hardware. It can also help you run your environment more efficiently. There are some great reports that help you identify which virtual machines are not using the resources that were allocated to them.  Take them back!

Coming from a VMware system engineer end user position, I can tell you that as your environment begins to grow, capacity management and planning becomes critical. I evaluated Capacity IQ when I was still on the customer side, and did a write up if you are interested in my thoughts on the product.

As customers transition from phase 1 into phase 2 of their virtualization journey, they begin virtualizing business critical applications. As they move into this phase, they often perform a POC to understand how their application performs on physical versus a virtual platform. Customers often ask for guidance on conducting a POC and we talk to them about the importance of an apple to apples analysis. What I mean by this is making sure the physical server and the virtual machine are configured identically (or as identical as possible). One area we often find differences is in the number of processors a physical server has versus the number of virtual CPUs (vCPUs) you can assign to a virtual machine.  Using the Microsoft System Configuration utility we can bring these two into alignment.

In our example, we will look at how to take a server that has 8 processors and 32 GB of RAM and configure this server to access 4 processors and 16 GB of RAM.

Below is a screen shot of System Properties screen and is accessible by clicking START > right click COMPUTER > Properties

From this screen shot we see the system has 8 processors and 32 GB of RAM.

cores_ram_1

Windows Task Manager, accessed by right clicking the Task bar, displays the same information.

cores_ram_2

Since we have determined the baseline for this analysis will be 4 processors and 16 GB of RAM, we will move onto configuring this server using the System Configuration utility.

First, click START > RUN > and type msconfig

step_1

This will open the System Configuration dialog box, and in this box click the Boot tab. On the Boot tab, click the Advanced options… button.

step_2

On the BOOT Advanced Options dialog box, check the  box next to Number of processors and then use the drop down to select the number of processors you want Windows to be able to access. In this case we selected 4.

Next, check the box next to Maximum memory and enter the amount of memory you want Windows to access. In this case we entered 17,408 (17*1024) since we want the OS to have 16 GB usable memory.

step_3

Once satisfied with the configuration, click OK to close the BOOT Advanced Options dialog box, then click OK to close the System Configuration dialog box, and then click Restart to apply the configuration changes you just made

step_4

After the system restarts, log in and open Computer Properties by clicking START > right click COMPUTER > Properties

As you can see from the screen shot below, the system has 4 processors and 16 GB of usable RAM

step_5

Windows Task Manager, accessed by right clicking the Task Bar, displays the same information.

step_6

To remove these settings, open the System Configuration dialog box by clicking START > RUN  > type msconfig

Next, click the Advanced options… button

And then uncheck the boxes next to Number of processors and Maximum memory.

step_7

Click OK on the System Configuration dialog box and then click Restart so your changes are saved. When the system reboots it will return to the original configuration.

A couple months ago after the introduction of the new EMC VNX arrays, I posted my thoughts on it here.  One of the engineering choices I questioned was the use of SSD’s for extending cache versus a PCI card.  It was always obvious why it would be better when cache was being added or replaced, but I questioned the throughput potential of a SAS interface versus a PCI one.

 

I got some interesting feedback on that from several people, and I appreciate it.  It wasn’t until the other day that I realized that the argument really did not have that much merit.  In a moment of blinding brilliance, I realized that the only time this might make a difference is when warming the cache.

 

How did I come to this realization?  I was in a VNX deep dive session presented by Chad Sakac, and I had every intention of asking him the question of PCI versus SAS when it comes to cache.  Lucky for me, he brings it up during the session, before I could ask.  Before he was done with the rest of the presentation, I realized an error in my prior way of thinking.

 

Chad pointed out that the time it takes for an IO to go through the controller, loops, and hit the flash is measured in nanoseconds (10-9).  Once it’s there, the flash has latencies in the microseconds (10-6).  So there is not likely to be a significant difference in latency between SSD, and PCI when it comes to cache.

 

PCI obviously has greater throughput potential, which is why I previously asked the question.  But a realization jumped up and bit me while I was sitting through this presentation.  Cache IO’s are usually small chunks of data that benefit from the reduced latency of flash / DRAM.  They aren’t giant read / write operations that generally require extremely wide bandwidth.  Will the increased bandwidth of PCI make a difference?  I have my doubts that it will be noticeable on the vast majority of workloads.  But this is just my opinion, as an outsider without the benefit of a storage engineering background.

 

I am looking forward to seeing the SPC1 benchmarks from the VNX.  I believe it will objectively tell the whole truth.  A slight difference on an anomalous workload is not significant enough to outweigh the benefits of SSD versus a PCI cache.  It’s easily swapped, and it’s non-volatile.  It only needs to be warmed once.  If a controller fails, the cache doesn’t die with it.  Replace a controller, and no need to rewarm cache.

 

Like I was alluding to in my last post on this. . .every design decision, whether it is in storage engineering, vSphere design, automobile design, is one of compromise.  The SPC1 will tell the whole story, but I think what we’ll see here is that this particular compromise was overall a good one.  What do you think?  Let me know in the comments.