With the right software, even a technology as old as the disk drive can overcome some of its own limitations. We can see many examples of this in the storage world these days. XIO is a great example of a company taking the same disk drives we’ve been struggling with for decades, and making them faster and more reliable.
Another up and coming company that believes in this approach is Pure Storage. I had the opportunity to visit their headquarters in Mountain View with the Virtualization Field Day crew, and got to see some of this software magic for myself. Chris Wahl, who had seen these guys before, made a comment that they are “SSD whisperers”. After my visit, I cannot disagree.
There’s no shortage of promises these days coming from the dozens of new startups centered around solid state disk technology. Before the Pure Storage visit, Mike Laverick was remarking how all these guys always say “we’re the only ones who actually GET solid state”. We all had a laugh, and wondered if Pure Storage would use that line. What we found was quite refreshing. Pure Storage didn’t feed us a lot of marketing or silly quotes. Instead, they actually made ME say “they’re the only ones who GET solid state”.
Pure Storage says they can sell you a solid state array for less money than a refrigerator-sized box of spinning rust. We’re not talking about less $ per IOP. We’re talking about less $ period. Like under $5 / GB. So 10x faster for less than the big spinning arrays. Let that sink in for a second.
With today’s advanced, auto-tiering arrays from the big boys in the storage business, it’s natural to wonder why all this is necessary. Why would we need an all flash array when we can pack a little bit of screaming fast SSD’s into a tray and let the array do the work to make sure your data is in the proper tier? With that methodology, can’t we can come in even cheaper per GB by adding massive SATA disks for the cold data? The answer to that question depends solely on your tolerance for latency.
In the chart above, we can see that even on some very high performing traditional arrays, even with the best tiering algorithms, we’re still going to see IO’s with very high latency. When you use the Pure Storage array, you don’t run that risk. In the demos that Pure Storage did for us, VMware had a hard time even measuring the disk performance, since it doesn’t offer anything less than 1 ms increments. As former VMware heavyweight Ravi Venkat pointed out, you can forget about SIOC, since you cannot set thresholds below 5 ms. If you see 5 ms from this array, it’s probably on fire.
As an EMC VMAX user, I can tell you that one of the underlying concerns in my mind every day is how FAST is working. I have very little visibility into what is actually being tiered, and when, and why. I have to just trust that EMC engineers are smarter than me, and that their tiering is going to prevent performance problems. While this may be easy for some, it’s very hard for me to just set it and forget it in my environment. There’s too much cost associated with latency for me to ignore the possibility that FAST is going to do the right thing at the right time. Plus, it’s reactionary. So even if it does do the right thing at the right time, the “right time” is still after the optimal time to have that data tiered higher.
This is one of the best things about the Pure Storage array, in my opinion. I don’t have to worry about whether the “magic” is working under the covers or not. All my data is on the fast stuff all the time, so I can relax. . . a little. ;-)
There are lots of features that make all this work reliably, and at much faster speeds than normal MLC. For an extremely detailed breakdown by Pure Storage’s co-founder and CTO John Colgrove, go here and watch the video.
For brevity’s sake, I’ll highlight a few features:
- - Inline dedupe using 512 byte segments (better ratios overall)
- - Compression
- - Thin Provisioning
- - Raid 3D (varies RAID levels based on current system activity – see video link)
- - High availability (no config stored on the controllers)
- - VAAI support
- - I/O optimization
That last one is the one I found most fascinating, and you can see John explain it more in the video. Every inbound write goes through a scheduling process that takes into account the current disk activity at a very granular level. Since writes are quite expensive on flash (in latency terms) versus reads, writes must be minimized, and highly distributed. This is where the scheduler comes in and looks at availability, workload, reliability, and lots of other characteristics of each piece of SSD. Then it makes a determination where to write that data to give the best latency. Also, if the system is loaded down, it can even pick a different RAID level dynamically to save on writes, thereby increasing performance.
This is where the magic is, in my opinion. It takes a lot of experience and know how to take MLC and make it as fast and as reliable as SLC, and based on their zero failure rate to date, I think they’ve done it. Of the 35 deployments they’ve done to date, 35% of those are for VMware environments. The industry mix is pretty interesting too, as you can see in the graphs below. This is not some niche product targeted at specific high performance applications.
I have heard some people question whether there is room for all these new storage startups. Since Pure Storage is a startup, I wanted to address the question, with regard to Pure Storage only.
First off, Pure Storage is an amazingly well funded company (from an outsider’s perspective). They’ve got $55M reasons why there is room for their storage startup. That $55M came from people who are a lot smarter than me, and can better answer the question of whether there is room. Check out the investor list.
Plus, they have attracted lots of top talent. I included a short list below, which doesn’t even include Ravi Venkat. Once again, I think the question of whether Pure Storage is a valid startup, or there’s a market for them is just silly. EMC, who just recently announced their roadmap’s inclusion of MLC, used to be a startup. Maybe they can get these guys to show them how to implement it.
Close your eyes for a moment and . . . . Wait. . don’t do that. . . But imagine for a moment your CEO calls your desk directly and is in a huge panic because one of his reports is taking way too long to run, and he needs it for the board meeting in 15 minutes. Instantly your life flashes before your eyes:
- All those arguments you had with the DBA’s and the application owners, and even your boss about how “we can’t possibly virtualize this application”.
- The meetings where the vendor said they would support it but they don’t “recommend” it.
- Conference calls where you told them they were just out of touch and that you could virtualize anything, and they wouldn’t even notice a performance hit.
- The look on their faces when they first tested the virtualized app and realized you were right.
And look at you now. This is all on you. It’s do or die time now.
So you bring up your preferred virtualization performance software to have a look. For your sake, I hope it’s Xangati VI Dashboard.
Having seen Xangati’s pitch before, and having tried the free version a couple years ago, I didn’t feel it was something I needed in my environment. However, last week at Virtualization Field Day 2 in Silicon Valley, the company’s founder, Jagan Jagannathan said one thing that really struck a chord.
“Liveness is what you need to do triage. If you want to do post-mortem, you don’t have to be live.”
He makes the point that in medical analysis, if you delay the analysis, even for a few minutes, the patient is dead. “Not all patients die. But some do.” It was at this point that the Xangati story clicked with me. It’s a tough product to get your head around in a quick demo, or marketing slide. But after hearing directly from the man who invented it, everything makes sense.
Jagan talks about other virtualization performance applications being largely database driven. They essentially suck in data at intervals, store it in a database, crunch it, and then pipe it out to a GUI for display. Some even require input from you on what interactions you might want to see before they even crunch the data.
Xangati sucks in the data and crunches it, with every interaction, all in RAM. This means the data you see is an order of magnitude more current from Xangati’s interface, than from the other guys’.
The other products are showing you a snapshot of data, followed by another snapshot, and so on. This is sufficient for the type of predictive trending coming out of vCenter Operations for example. Xangati can crunch 1 million metrics per second and pipe them right to your display.
Which data would you rather have when your CEO is standing over your shoulder? Which data would you rather have if you’re running thousands of VDI sessions like at the VMworld Labs? Xangati was VMware’s choice for the Labs environment. And since we have all taken a sort of Virtualization Hippocratic Oath by talking companies into virtualizing, we cannot afford to let our patients die on the table because we didn’t have the data to save them.
I had the good fortune of sitting with Jagan at dinner after their presentation, and we got into a conversation about a huge paradigm shift in our industry that’s happened over the past decade. A couple years ago, SAP founder Hasso Plattner was asked by his own employees why he felt the need to deliver an in-memory appliance. His response nails what I feel this paradigm shift is all about.
“People at SAP ask me, ‘Why do you insist on running a dunning program in seconds instead of two minutes? No one is asking for that type of speed for a dunning program,’ ” Plattner said.
“And I tell them, “You are asking the wrong question: the right question is, how long will someone with an iPhone wait for an answer? And the answer is that 15 seconds is the absolute maximum amount of time people will wait before they go and start doing something else: check voicemail, send text messages, check email, send text messages to themselves . . . . This is the new reality!”
In most enterprises a decade ago, the world did not come to an end if an application was down for a few hours. People took a long lunch, and moved on. In this new world, people go absolutely insane over the slightest performance degradation of any application.
Downtime is unthinkable, even for the most mundane and “insignificant” application. Can we blame all this on the iPhone? I’m not sure, but one thing I do know is that we had better have the tools to enable us to deliver on these expectations. Xangati is a huge step in the right direction.
There’s a lot more to Xangati, like industry leading awareness and visibility for VDI environments, and the ability for users to initiate recordings of metrics while a problem is occurring. Cool features abound. You can read about some of them over on Rodney Haywood , Dwayne Lessner and Chris Wahl’s blogs. For me, the one feature that stands out most is the live data. The life you save could be your own.
I’ll be heading to Virtualization Field Day 2 Feb 22-24 in Silicon Valley! What is Virtualization Field Day? It’s a 2 day event packed with in-depth and interactive Q&A between vendors in the virtualization space, and independent bloggers / writers / thought leaders in the industry.
Vendors get to showcase products that are real, or on the drawing board, and they get solid, candid feedback from independent IT pros that helps them make their products better for all of us.
Delegates get a first look at some of the coolest new technologies everyone will be talking about in the coming months, as well as an opportunity to get hands-on with them and ask the tough questions that would never be allowed in a webcast full of random people. Some things may be covered by NDA, or an embargo date, but the majority of the event can be viewed live right here as it happens!
If you can’t catch the stream live, follow us on Twitter with hashtag #VFD2, and tweet us your questions for the vendors. The videos will be posted after the event concludes so you can go back and catch anything you might miss.
This is my second Tech Field Day event, and based on the presenter and delegates list, it’s going to be fantastic!
What makes these events so valuable is the expectation of independence and objective nature of the delegates. Combine this with the hard work and dedication of Steven Foskett, and Matt Simmons, who plan everything to the last detail, and coach the vendors ahead of time so they don’t bring lame marketing presentations to real technical guys and gals. The stream will definitely be worth your time!
Are there vendors making something awesome you’d like to see present at an event down the road? Nominate them here!
Do you love technology, and work for a non-IT vendor? If you’d like to become a delegate, find out how here!
In the interest of full disclosure, delegates’ travel expenses to and from the event, as well as accommodations during the event are covered by sponsors. As with any tech event, delegates may receive swag from vendors, but delegates are not under any obligation to blog, tweet, or even like the products. Of course if there are cool products that interest delegates, they may be discussed on various social media sites, but there is no compensation, or expectation from either side after the event concludes.
If you have HP BL 460 G7′s with the on-board 10GB CNA, you’re going to want to read this post regarding a problem with the latest firmware.
This was first noticed this issue when updating firmware to troubleshoot an issue where the storage doesn’t come back up after rebooting an upstream Nexus switch.
The symptoms are: the NIC comes back up, and the vfc is up, but all storage paths on that side of the fabric are still dead in ESXi 5.0. To fix this issue, the vfc or port channel must be shut /no shut.
I also saw an issue where the storage paths were dead, and the NIC never came back up. A reset of the Ethernet port will not fix this. A reboot of the ESXi host is required. Pay attention to the NIC state if you lose storage paths in this configuration with FCoE.
As part of my troubleshooting, I went to update the firmware on the CNA. The latest version of the firmware from HP is 4.0.360.15a. When updating using the Emulex utility, on about 20% of my blades, I got a CRC error during the upgrade process. Below is a screenshot of this error.
After retrying the firmware update, as stated in the utility, the same error occurred.
This is where you need to pay attention!!
Since the firmware version is correct, one might assume the update was indeed successful. That’s a bad assumption. Upon further testing, we found the blades that failed the firmware update were the ones failing during the switch reloads.
There were only 2 blades that did NOT fail the firmware update, but still failed the switch reload process. They were replaced, and now I have no blades failing to reacquire storage paths after an upstream switch failure.
I must point out that HP has been unusually proactive with this issue, which is a nice change! I still have several blades in another datacenter that are not taking the firmware update. When I scheduled to have those all replaced, HP got some of their top people on it and scheduled a call. I tested their proposed fix this morning, which didn’t work.
They are actively working on a fix, so you won’t have to replace your blades. I will update this post as soon as I get word back from them on that fix. Meanwhile, if you’ve seen this, you might want to schedule some switch reloads during a maintenance window to make sure you are good to go.
As of today, there is no fix that I’m aware of. . . HP replaced the remaining blades after we tried a couple more proposed fixes. If I get word of a fix, I will post it here.
Over the past week, I have been reflecting on just how amazing 2011 was for me, with lots of help from the entire VMware community. I won’t bore all my readers with EVERY detail, but what good is a blog if you can’t boast once in a while?
In 2011, after a couple of years of planning, evaluating, and trying to get funding, I started implementing VMware on a large scale at the company I work for. We had used it in development, and for certain niche apps, but now it’s coming in wholesale. Thanks to VMware, and their amazing development staff, I was able to create some MONSTER clusters without worrying about too many HA Primaries on each blade chassis. Thank you VMware!
Also in 2011, there was much deliberation and evaluation of many different storage arrays from several vendors. I needed something to replace some old HP EVA’s. Yes. . .I have been critical of EMC in the past, and honestly, they still deserve some criticism. However, in the end, we bought VMAX’s.
One of the main reasons VMAX was the only one left standing was its support for mainframe. Also, Chad’s army of vSpecialists shows EMC’s commitment to tightly integrating VMware into their products, which is comforting. Was VMAX extraordinarily expensive? Yes. Has VMAX been a bit of a pain in the rear to get up and running right? Indeed. But as of the end of the year, the things are absolutely screaming, and I am very pleased with the performance, and the integration points.
All the work I have done this year to get this new environment up and running, and begin migrating environments over to the new VMware platform would not have been possible without the help of many people in the community. I have thoroughly enjoyed reading everyone’s blogs. Also, reading both of Scott Lowe’s (Forbes & Maish too) 2011 books, Frank and Duncan’s second amazing ESXi Clustering book were extremely helpful. I have Mike Laverick’s SRM book, as well as a few other recent ones on my desk for 2012 reading. Never before have we had so much access to so much in-depth knowledge on every aspect of VMware. This speaks very highly of VMware’s care and feeding of the community.
The most time saved this year for me has been via the use of William Lam’s scripts, and Luc Dekens et. al’s PowerCLI Reference. These guys are amazing, and I urge you to both buy the book, and support Lam’s virtuallyGhetto site and script repository.
With the help of Jason Nash, and J Michel Metz, I got my 1000V nailed down, and FCoE smoking on the the rest of the Nexus stack. As Metz says, if FCoE were a video game, he would be the boss fight at the end! Thanks!
I didn’t make it to VMworld this year with all the work going on here. I did get to attend Backup Central Live with W. Curtis Preston. What a super cool seminar. Definitely not your typical one day BS event. I came away with real knowledge that I could put to use right away. Here’s my review of the event.
I was part of a VMware focus group for the portal redesign this year. That was fun, but my NDA won’t allow me to mention details. I think this was worthwhile, and I took many of your comments on Twitter to the guys doing the redesign. We will see a much more efficient VMware site really soon that will save us all time!
The coolest thing I got to do in 2011 is join Gestalt IT, and attend Tech Field Day 7 in Austin. That was an amazing experience. I got to interact with amazingly smart, independent thinkers in the industry. I also saw some cool new products and ideas from Dell, SolarWinds, Symantec, and Veeam. I haven’t had much time to blog about these, but I do plan on evaluating a few of the products I saw, and posting my opinions as soon as time allows in 2012. I’m definitely looking forward to my next TFD event! I would encourage any of my readers who are not employed by a vendor to contact myself, or Stephen Foskett if you’d like to attend yourself! Stephen and Matt Simmons work very hard to make these events quite valuable for both presenters, and participants.
I’m sure I forgot to thank plenty of folks. Sorry.
Ohh yea. . . I nearly forgot one other thing. I also got to enjoy the birth of my second son in 2011. Amazing!
Happy New Year!
For a while, I’ve been looking for a way to pick which “slots” our VEM’s go into on the 1000V VSM. It would make troubleshooting much easier, and it just makes more sense to the networking guys who are used to working with physical line cards and supervisors.
A network escalation engineer over at VMware came through with a process for renumbering the VEM’s. It’s simple, but it never really occurred to me that it was this simple.
All you need to do is grab the host id of the VMware host from the VSM config, shut down the host to take the VEM offline, and then renumber it in the VSM config.
Here’s a screenshot @benperove sent over detailing the process. I’m definitely doing this ASAP on my 1000V’s! Thanks Ben!
Based on the comments, and the other posts that said there was no point in setting IOPS to 1 on Round Robin, I decided I was going to have to get more aggressive and test a wide variety of workloads on multiple hosts and datastores. My goal is to see if there would be any significant difference between Round Robin and PowerPath VE in a larger environment than I was testing with previously.
For Round 3 of my tests, I use 3 hosts, 9 Win2008 R2 VM’s, and 3 datastores. My hosts are HP BL460 G7 blades with HP CNA’s. All hosts are running ESXi 5 and are connected via passthrough modules to Cisco Nexus switches. FCoE is being used to the Nexus, and then FC from there to Cisco MDS’s, then to the VMAX. No Storage IO Control, DRS, or FAST is active on these hosts / LUN’s.
Here are the test VM’s, and their respective IOMeter setup:
The first test is Round Robin with the IOPS=1 setting. We’re seeing 20,673 IOPS with an average read latency of 7.69ms. Write latency is 7.5ms on this test. When we change all LUN’s back to the default of IOPS=1000, we see a significant drop in IOPS, and a 40% increase in latency. Since the bulk of my IOMeter profiles are sequential, this makes sense. EMC tests, as well as my own, show that there is little difference between IOPS=1 and IOPS=1000 when dealing with small block 100% random I/O.
When switching to PowerPath hosts, we see the IOPS increase around 6%. This is probably not statistically significant or anything, but what I did find interesting is the 15% better read latency. My guess is that PowerPath is dynamically tuning based on the workload profile from each host, where Round Robin is stuck at whatever I set as the IOPS= number.
Here’s the scorecard for Round 3:
To sum up our last round of comparisons, it was nice to see results using more hosts, datastores, and VM’s with varying I/O profiles. While this was helpful, no one can really simulate what real workloads are going to do in production, with IOMeter.
PowerPath for physical servers is a no-brainer. Based on my results, I am recommending the purchase of PowerPath VE for my VMware environment as well. In my opinion, it comes down to predictability, and peace of mind. I cannot predict what all workloads are going to look like in my environment for the future, and I am not willing to test and tune individual LUN’s with different Round Robin settings. I’d much rather leave that up to a piece of software.
Thanks for all the comments and ideas for these tests and posts.
Apparently there is a
bug feature in the 5548 / 5596 switch where they left out the default QoS policies. Those have to be in place for FCoE to work. So they’re shipping the switch with FCoE enabled, but this QoS policy is missing.
What results is once you get everything setup, you’ll see some FLOGI logins, but it’s very sporadic. The logins will come in and out of the fabric, and no FCoE will happen. Your FCoE adapters will report link down, even though the vfc’s are up, and the ethernet interfaces are up.
What I suspect is happening – and take this for what it’s worth from an expired CCNA – is the MTU isn’t set properly for the FCoE b/c the system QoS policies aren’t letting the switch know that there is FCoE. It wasn’t until I mentioned that I changed the default MTU that the Cisco TAC level 2 guy finally remembered this little QoS problem with the big switches.
But he sent me the article, so I’ll save you some time.
If you copy the code in blue and paste it, your links will come up instantly and you’ll be ready to roll. Here’s the link to the Cisco article.
The FCoE class-fcoe system class is not enabled in the QoS configuration.
For a Cisco Nexus 5548 switch, the FCoE class-fcoe system class is not enabled by default in the QoS configuration. Before enabling FCoE, you must include class-fcoe in each of the following policy types:
The FCoE class-fcoe system class is not enabled in the QoS configuration.
For a Cisco Nexus 5548 switch, the FCoE class-fcoe system class is not enabled by default in the QoS configuration. Before enabling FCoE, you must include class-fcoe in each of the following policy types:
The following is an example of a service policy that needs to be configured:F340.24.10-5548-1class-map type qos class-fcoeclass-map type queuing class-fcoematch qos-group 1class-map type queuing class-all-floodmatch qos-group 2class-map type queuing class-ip-multicastmatch qos-group 2class-map type network-qos class-fcoematch qos-group 1class-map type network-qos class-all-floodmatch qos-group 2class-map type network-qos class-ip-multicastmatch qos-group 2system qosservice-policy type qos input fcoe-default-in-policyservice-policy type queuing input fcoe-default-in-policyservice-policy type queuing output fcoe-default-out-policy service-policy type network-qos fcoe-default-nq-policy
As promised in the first post, here is round 2 of my testing with PowerPath VE and vSphere 5 NMP Round Robin on VMAX. For this round of testing, I changed the Round Robin iooperationslimit to 1, from the default of 1000.
I understand that this is not recommended, and I also understand that further testing is needed with multiple hosts, multiple VM’s and multiple LUN’s. As soon as I get the time, I’ll do that and report back.
For the background, and methodology, click the link above to read the first post. For now, I’ll skip right to the scorecard.
As we can see here, setting Round Robin IOPS to 1 definitely evens the score with PowerPath. I expected to see more CPU activity than PP, but that wasn’t the case. I also expect to see more overhead on the array once I add more hosts, VM’s and LUN’s to the mix. It might be a few weeks before I can pull that off.
Thanks for reading, and commenting. Round 3 to come.
This past year, I did an exhaustive analysis of potential candidates to replace an aging HP EVA infrastructure for storage. After narrowing the choices down, based on several factors, the one that had the best VMware integration, along with mainframe support was the EMC Symmetrix VMAX.
One of the best things about choosing VMAX in my mind was PowerPath. It can be argued whether PowerPath provides benefits, but most people I have talked to in the real world swear that PowerPath is brilliant. But let’s face it, it HAS to be brilliant to justify the cost per socket. Before tallying up all my sockets and asking someone to write a check, I needed to do my own due diligence. There aren’t many comprehensive PowerPath VE vs. Round Robin papers out there, so I needed to create my own.
My assumption was that I’d see a slight performance edge on PowerPath VE, but not enough to justify the cost. Part of this prejudice comes from hearing the other storage guys out there say there’s no need for vendor specific SATP / PSP’s since VMware NMP is so great these days. Here’s hoping there’s no massive check to write! By the way, if you prefer to skip the beautiful full color screen shots, go ahead and scroll down to the scorecard for the results.
Tale of the Tape
My test setup was as follows:
|Test Setup for PowerPath vs. Round Robin|
|2 – HP DL380G6 dual socket servers|
|2 – HP branded Qlogic 4Gbps HBA’s each server|
|2 – FC connections to a Cisco 9148 and then direct to VMAX|
|VMware ESXi 5 loaded on both servers|
|All tests were run on 15K FC disks – no other activity on the array or hosts|
Let’s Get It On!
(i’m sure there’s a royalty I will have to pay for saying that)
Host 1 has PowerPath VE 5.7 b173, and host 2 has Round Robin with the defaults. Each HBA has paths to 2 directors on 2 engines. I used IOmeter from a Windows 2008 VM with fairly standard testing setups. Results are from ESXTOP captures at 2 second intervals.
The first test I ran was 4k 100% read 0% random. All these are with 32 outstanding IO’s, unless otherwise specified.
Here is Round Robin
And PowerPath VE
First thing I noticed was that Round Robin looks exactly like my mind thought it would look. Not that that means anything. I do realize that this test could have been faster on RR with the IOPS set to 1, and maybe I’ll do that in Round 2. As for round 1, with more than twice the number of IOPS, PowerPath is earning its license fee here for sure.
How about writes? Here’s 4k 100% write 0% random.
Once again, PowerPath VE shows near 2x the IOPS and data transfer speeds. I’m starting to see a pattern emerge.
How about larger blocks? 32K 100% read 0% random.
PowerPath is really pulling ahead here with over 2x the IOPS yet again.
32K 100% write 0% random
Wow! PowerPath is killing it on writes! Maybe PP has some super-secret password to unlock some extra oomph from VMAX’s cache.
Nevertheless, it’s obvious that PP is beating up on the default Round Robin here, so let’s throw something tougher at them.
Here’s 4K 50% read 25% random with 4 outstanding IO’s.
The gap between the contenders closes a bit with this latest workload at only a 24% improvement for PP. But as we all know, IOPS doesn’t tell the entire story. What about latency?
4k 100% write 0% random
Write latency is 138% higher with Round Robin! That’s a pretty big gap. Is it meaningful? Depends on your workload I guess.
Scorecard after Round 1
So far, PowerPath looks like a necessity for folks running EMC arrays. I’m not sure how it would work on other arrays, but it really shines on the VMAX. In some of my tests the IOPS with PowerPath were three times greater than with the standard Round Robin configuration! I do believe that the gap will shrink if I drop the IOPS setting to 1, but I doubt it will shrink to anywhere near even. We will see.
In addition to the throughput and latency testing, I also did some failover tests. I’m going to save that for a later round. I don’t want this post to get too long.
Several months ago, a small firm I consult for ordered a Drobo Elite (recently replaced by the B800i). These guys had run ESXi for a while in one of their environments, and were wanting to explore some of the features requiring shared storage. Like most
small businesses, they wanted to get there without breaking the bank. There aren’t a ton of options in the $6-7k range for iSCSI arrays on the VCG, so it was an easy choice.
Their CIO called up Drobo and placed the order. He explained what they were going to use it for, and the guy configured it right over the phone and shipped it out. A few days later, the Drobo Elite arrived configured with 8 x 2TB Western Digital (WD20EARS) disks at a cost of just under $6k.
Setup in ESXi was straight forward. I followed the documentation from Drobo and set the PSP to VMW_PSP_MRU and SATP to VMW_SATP_DEFAULT_AA and started throwing VM’s on for testing.
The initial tests were okay. I wasn’t really bouncing around the room yet, but I am used to larger FC array speeds. Once I saw that IOmeter was pushing the expected number of IOPS, we were ready to throw on a few VM’s. For some context, we’re talking about a 100 person company with about 20 servers in total. They’re running 50% of those on ESXi right now on two hosts. Once normal daily production started with 3-4 VM’s hitting the Drobo, everything screeched to a grinding halt.
Latency, as reported in ESXTOP, was showing 4-5000ms, and there wasn’t any single workload that was giving it a tough time. I went back in and double checked the iSCSI config. All the bindings were correct, as were the PSP and SATP. Nothing had changed except adding a couple more VM’s to the Drobo.
I began to suspect the switch was misconfigured, so I pulled it out, and went direct to the Drobo. That didn’t really yield a noticeable improvement. After troubleshooting this forever, and deliberating on the phone with Drobo, they announced their verdict. Apparently the WD “Green” drives are not supported with VMware. They said we’d need to buy the Black drives.
Their site quickly confirmed. But again, since Drobo configured the unit, knowing it was for a 2 host VMware environment, we both assumed the Green drives were sufficient. The extra cost of the Black wasn’t warranted for this environment. I could understand if the customer had gone out and bought some random drives, but they came with the unit directly from Drobo.
They had us run some of their own IOmeter tests directly connected from a Windows box using the MS iSCSI initiator. We then went ahead and swapped the disks for the recommended WD Black disks, and below are a few charts showing the results.
The Black is faster in every way, but the most noticeable aspect is write latency. I suspect this is due to the increased processing power and faster cache. Nevertheless, the results speak for themselves.
Bottom line is, if you’re going to run ESXi on a Drobo, don’t go green!
BREAKING – PALO ALTO (VP)
The VMware licensing debate was killed this afternoon while trying to rescue the #vTax hashtag from the inside lane of the Ridiculous Interstate. Witnesses say a bearded, balding “smart-looking” man was driving north at a very high rate of speed in a truck with the license plate VMW when the debate was struck. The truck backed up and struck the debate again and again before authorities arrived and pronounced the debate dead at 5 PM PDT today.
I am writing this as a blogger at Virtual Insanity, and a customer of VMware. I don’t sell VMware, and I’ve never worked for VMware. I don’t even work for a partner. I barely get to chat with my fellow bloggers who work for VMware, and am certainly not privy to inside information, despite my company’s NDA.
With that out of the way, VMware has done the right thing here. The fact that they can take customer feedback and mold it into a dramatic licensing change, just a few weeks before a product GA’s, is astounding. That speaks not only to the agility of the company, but their willingness to please their customers.
They even went out of their way to please NON-PAYING customers with this change. The change to the free version was causing more drama than the change to customers who spend millions with VMware.
Should VMware have focus-grouped the licensing change more than they did? Yes. It would have preempted the customer perception wildfire they have had to fight for the past couple of weeks. I am sure they ran the numbers and knew that only a small percentage would be impacted. But the fact is an even smaller percentage actually ran the scripts to see how it would affect them. Once the fervor got started by a few, it wasn’t going to stop.
A price increase was inevitable. VMware has given us HUNDREDS of new features in the past several years for free. I think not increasing it with 4.0 was the right move, but they couldn’t hold out forever. The new vRAM allotments and policies are spot on, and are going to put a lot of customers’ fears to bed.
Now we can get on with discussing the amazing new features of vSphere 5.0 without that licensing cloud hanging over our heads.
Recently I have been researching HP C7000 chassis connectivity options extensively. Prior to diving deep into it, Virtual Connect FlexFabric seemed like a no brainer. On the surface, it has many advantages.
The cabling / port reduction is an obvious win, as is the ability to have some control over WWID and MAC assignment to blades. Moving East / West traffic between chassis without having to go Northbound to a ToR or EoR switch is attractive as well. Of course these are all things that are just standard with UCS, but I digress.
After many meetings with HP, I still had some questions that were unanswered. I turned to the many thousands of pages of HP documentation on the subject. Sifting through all the “cookbooks” and the in-depth guides to Virtual Connect, and talking with some current users of FlexFabric, I came to the conclusion that it is missing some key features that are needed in a VMware environment. In fact, I would say that for Cisco shops running VMware, HP FlexFabric makes little sense.
The biggest problem I have with Virtual Connect FlexFabric is the lack of any real QoS. Once traffic enters the Virtual Connect module, it’s anarchy. There are no controls in there for prioritization or control of bandwidth. In a VMware environment, where there will be multiple types of traffic, each capable of generating significant load, the only control you have on VC is egress rate limiting.
It’s akin to limiting the number of people one can put in a single car, right before driving through the middle of Rome.
For those who haven’t had that experience, trust me, it’s the same type of anarchy that occurs inside VC. The only rule is try not to die.
Here’s a nice diagram showing Virtual Connect and VMware traffic flow design from M. Sean McGee’s blog:
When you have a Cisco 1000V on the ESXi host and a Nexus 5K on the other end, it makes little sense, in my opinion, to completely break awesome features like Priority Flow Control and Bandwidth Management. HP states that they do support FCoE and DCB (CEE), which should include the above features, but their own guys cannot really say how one would configure, or troubleshoot it. That’s part of the problem. VC is a black box that abstracts your ability to see what is going on inside.
One of my other negatives for VC FlexFabric is that I have no choice but to split my 10GbE pipe into smaller pipes if I want to run an HBA off the adapter. If I use the exact same onboard CNA without FlexFabric, I don’t have to do that. This can be solved with separate HBA’s, or 10Gb NIC’s, but that negates the alleged cost savings. So now I’m forced to try and guess how much bandwidth I need for each traffic class, when I already own switching infrastructure that is smart enough to do that for me.
In my opinion, this is akin to disabling DRS. DRS is smarter than you, and faster. Why would anyone disable it? Cisco QoS is certainly smarter than me, as is VMware NetIOC. So why would I want to throw some arbitrary limits on my huge pipe? VMware admins understand that shares are better than reservations or limits. The reasoning is the same on the networking side.
There are other problems I see with this solution, but I don’t want to bore you. One complaint I have heard from close associates is the HP recommended method of “stacking” VC modules is problematic. Not only do you have to give up 3 of the 8 ports per module for stacking, but it can create bandwidth issues as well. Recently, a friend of mine had to completely revamp his setup to uplink everything, as opposed to stacking, which was allegedly causing bandwidth problems in his environment. Ohh, and in addition to all this, the FlexFabric module will take FCoE and pass it North as standard Ethernet. So you lose any of the FCoE features provided by your Nexus switch.
Companies that are not virtualizing certain applications, but will run them on blades, may find that the advantages of moving around MAC and WWID’s outweigh the potential disadvantages of FlexFabric. Everything on my blades will be ESXi, so I don’t really have a need for quick physical ID recovery.
As of right now, I plan to use passthrough modules on the C7000’s. At least until a better alternative comes out. Passthrough is slightly more expensive on the uplink port side, but it doesn’t prevent my networking team from having end to end visibility and management. And that takes some of the guesswork, and the administration off of my team, which is a good thing! I would be interested to hear your experiences in the comments below.
Cisco decided to shut down Flip last month. Why? Because it’s a low margin business that Cisco has no business owning. There is talk about killing Linksys, or spinning it out. Why? Low margins and it doesn’t jive with Cisco’s core competency. UCS (datacenter unified computing system) is another product that has very low margins, and really should be sold if Cisco is to remain as strong as it has been over the past two decades.
I find it interesting that only a year ago, all the industry pundits were talking up Cisco and their stock was riding high. How quickly the sands have shifted under their feet. Shareholders and industry experts are calling for Chambers to resign, and some have even suggested they get rid of UCS. Last week’s Infosmack featured some interesting commentary on Cisco selling UCS. GigaOm thinks Cisco has lost that lovin’ feeling for VCE. They seem to be investing as heavily as EMC, but they get a much, much smaller piece of the pie on all those sales. And let’s face it, VCE sales are expensive. Maybe Cisco should have bought EMC when they had the chance?
I thought Robin Harris’ comment over on Storagemojo was profound:
UCS lowers Cisco’s margins; enrages large resellers; and has no sustainable competitive advantage. Cisco can’t wish those facts away, and the stock market won’t forget them either.
The sustainable competitive advantage thing is a big one.
Even with the latest IDC report showing that UCS has overtaken Dell to become the #3 blade player, there is still plenty of uncertainty in the market. I can say from my own experience that executives, who admittedly know very little about UCS and what it brings to the table, are shying away from it out of fear that Cisco could exit the server business.
From the very beginning, there was talk of Cisco not being “serious” about becoming a server vendor. Add the recent stock troubles, and decision makers are less willing to stick their necks out on millions worth of UCS. After all, nobody has ever been fired for buying IBM.
Companies often take a bath when they get into areas that go against their long standing value propositions. BMW lost $4 Billion when it sold Rover to Ford for $1. Cisco spent $600 Million on Flip only 2 years ago. The fact that Cisco first approached IBM and HP with the UCS idea, and was rejected only proves that Cisco knew it didn’t want to be in the server business before it . . .got into the server business. Perhaps now that they have made their point, one of the server vendors will be interested in a UCS purchase.
With HP getting amazingly aggressive on pricing of their network offerings, and Juniper introducing QFabric, Cisco’s attention needs to be focused on their core competency if they wish to maintain those luxuriously high margins into the future.
I am sure it comes as no surprise to any of our readers that virtualization is not the exclusive full-time focus for most of us. Most of us have a breadth of responsibility spanning gobs of infrastructure layers in our respective organizations. One common pain point that most of us have is backups.
For many companies, backup is an afterthought. It doesn’t contribute to the profitability of the company. It doesn’t help you make more widgets in the same amount of time. The result often times is a neglected backup system when it comes to budgets and spending. Most of the time, even though we know the importance of backups, we’re okay with it taking a back seat. After all, who wants to goof around with tape drives when there are cool new blades and SSD storage to play with?
It was this frame of mind that I found myself in on Tuesday of this week. I had signed up for W. Curtis Preston’s Backup Central Live a while back on Stephen Foskett’s recommendation. I knew it would be decent, as I had used Backupcentral.com for a long time as a valuable resource to help deal with those dreaded backup problems. But when Monday came, I found myself wondering why the heck I signed up for this seminar. I had so much work to do this week, and most of it was fun SAN and VMware planning and design stuff. I didn’t have time for baaaaackups. . . Grrrr.
In the end, my boss was pumped about the seminar. I knew I couldn’t back out without getting grief, so reluctantly, I made the 1.5 hour drive to Cary, NC for a full day of backups. I knew Curtis would be a great speaker, and have good insight. I have heard him many times on Infosmack, and I know from his blog posts that he knows his stuff. I just wasn’t looking forward to a full day of vendor pitches between the valuable information.
Ultimately, I was impressed with the event, and it was far from a waste of time. Even the vendor presentations were decent, and they kept to a reasonable time limit, so the pace was perfect. I’ll give you a quick rundown of what I learned at this event.
Often times we feel alone in our backup struggles. At the seminar, there was wireless polling during the presentation, so we had real time answers to our questions. That alone was a fantastic change; and I prefer this to raising my hand 48 times during a session. From this polling data, I learned that I am not alone. Many share in my misery.
- 49% of attendees still do backups DIRECT TO TAPE.
So while us 49% think that no one hears our screams, at least now we know that we’re not the only ones screaming. I think we all know that tape is not a suitable target for server backups. The problem only gets worse as tape drives get faster. Disk, at least as a staging area, is a necessity now for reliable backup to tape.
That said, Preston points out that tape is a long way from being displaced from the datacenter. Tape is still 50x cheaper than disk, and more reliable for long-term data storage. One fact I found enlightening was that hard disks are not designed or tested to store data long term while powered off. This is something I had never thought about, and only a couple of companies, like ProStor, are trying to solve this problem. Even if we solve for the reliability difference, it will likely be decades before we see a significant degree of cost parity (if ever).
A speaker from Cambridge Computer Services talked about new cool ways people are using tape as part of a tiered strategy for primary data. Some are even using tape as a mirror for their primary storage. Of course this requires a gateway appliance with plenty of cache, and good software, but the savings are real.
Another crucial area we touched on was that of archival, especially as it relates to electronic discovery (ED). Almost NO ONE is doing this. The vast majority is using their primary backup software and methodology for archival. This is an expensive mistake if you ever are called upon to do discovery. In addition to my own experience with ED, Preston tells a story of a client who spent millions to satisfy a single discovery request.
Apparently a single user’s e-mail for the past three years was requested. As they were only doing normal Exchange backups, that meant restoring 156 different monthly Exchange backups, and then fishing for this guy’s mails. It took an army of consultants working three shifts MONTHS to do this. Since we live in a litigious world these days, it might be a good idea to get your ED and archival in order. One product that was recommended at the seminar was Index Engines. I haven’t had time to look at it yet, but it sounds brilliant!
One interesting statistic we saw in the polling data was that the majority of attendees had an overblown opinion of themselves when it comes to their own backup environments. The majority said their backups ran well. Preston’s experience tells quite a different story. The scary part of this is that people don’t know that their backups suck. They find out when it’s too late.
The most valuable part of this seminar was the discussion time at the end. There was many interesting discussion around cloud backups, AWS outage, and snapshots. This brought everything together that we had learned during the day.
There isn’t space in a single blog post to cover all the material from a full day seminar, but I hope I’ve given you enough to help make a decision to check this event out when it comes your way. I have to give it to the Backup Central Live crew for taking a topic that most people hate, and turning it into a valuable day of learning.