Archive for the ‘Brandon Riley’ Category
If you have HP BL 460 G7′s with the on-board 10GB CNA, you’re going to want to read this post regarding a problem with the latest firmware.
This was first noticed this issue when updating firmware to troubleshoot an issue where the storage doesn’t come back up after rebooting an upstream Nexus switch.
The symptoms are: the NIC comes back up, and the vfc is up, but all storage paths on that side of the fabric are still dead in ESXi 5.0. To fix this issue, the vfc or port channel must be shut /no shut.
I also saw an issue where the storage paths were dead, and the NIC never came back up. A reset of the Ethernet port will not fix this. A reboot of the ESXi host is required. Pay attention to the NIC state if you lose storage paths in this configuration with FCoE.
As part of my troubleshooting, I went to update the firmware on the CNA. The latest version of the firmware from HP is 4.0.360.15a. When updating using the Emulex utility, on about 20% of my blades, I got a CRC error during the upgrade process. Below is a screenshot of this error.
After retrying the firmware update, as stated in the utility, the same error occurred.
This is where you need to pay attention!!
Since the firmware version is correct, one might assume the update was indeed successful. That’s a bad assumption. Upon further testing, we found the blades that failed the firmware update were the ones failing during the switch reloads.
There were only 2 blades that did NOT fail the firmware update, but still failed the switch reload process. They were replaced, and now I have no blades failing to reacquire storage paths after an upstream switch failure.
I must point out that HP has been unusually proactive with this issue, which is a nice change! I still have several blades in another datacenter that are not taking the firmware update. When I scheduled to have those all replaced, HP got some of their top people on it and scheduled a call. I tested their proposed fix this morning, which didn’t work.
They are actively working on a fix, so you won’t have to replace your blades. I will update this post as soon as I get word back from them on that fix. Meanwhile, if you’ve seen this, you might want to schedule some switch reloads during a maintenance window to make sure you are good to go.
As of today, there is no fix that I’m aware of. . . HP replaced the remaining blades after we tried a couple more proposed fixes. If I get word of a fix, I will post it here.
Over the past week, I have been reflecting on just how amazing 2011 was for me, with lots of help from the entire VMware community. I won’t bore all my readers with EVERY detail, but what good is a blog if you can’t boast once in a while?
In 2011, after a couple of years of planning, evaluating, and trying to get funding, I started implementing VMware on a large scale at the company I work for. We had used it in development, and for certain niche apps, but now it’s coming in wholesale. Thanks to VMware, and their amazing development staff, I was able to create some MONSTER clusters without worrying about too many HA Primaries on each blade chassis. Thank you VMware!
Also in 2011, there was much deliberation and evaluation of many different storage arrays from several vendors. I needed something to replace some old HP EVA’s. Yes. . .I have been critical of EMC in the past, and honestly, they still deserve some criticism. However, in the end, we bought VMAX’s.
One of the main reasons VMAX was the only one left standing was its support for mainframe. Also, Chad’s army of vSpecialists shows EMC’s commitment to tightly integrating VMware into their products, which is comforting. Was VMAX extraordinarily expensive? Yes. Has VMAX been a bit of a pain in the rear to get up and running right? Indeed. But as of the end of the year, the things are absolutely screaming, and I am very pleased with the performance, and the integration points.
All the work I have done this year to get this new environment up and running, and begin migrating environments over to the new VMware platform would not have been possible without the help of many people in the community. I have thoroughly enjoyed reading everyone’s blogs. Also, reading both of Scott Lowe’s (Forbes & Maish too) 2011 books, Frank and Duncan’s second amazing ESXi Clustering book were extremely helpful. I have Mike Laverick’s SRM book, as well as a few other recent ones on my desk for 2012 reading. Never before have we had so much access to so much in-depth knowledge on every aspect of VMware. This speaks very highly of VMware’s care and feeding of the community.
The most time saved this year for me has been via the use of William Lam’s scripts, and Luc Dekens et. al’s PowerCLI Reference. These guys are amazing, and I urge you to both buy the book, and support Lam’s virtuallyGhetto site and script repository.
With the help of Jason Nash, and J Michel Metz, I got my 1000V nailed down, and FCoE smoking on the the rest of the Nexus stack. As Metz says, if FCoE were a video game, he would be the boss fight at the end! Thanks!
I didn’t make it to VMworld this year with all the work going on here. I did get to attend Backup Central Live with W. Curtis Preston. What a super cool seminar. Definitely not your typical one day BS event. I came away with real knowledge that I could put to use right away. Here’s my review of the event.
I was part of a VMware focus group for the portal redesign this year. That was fun, but my NDA won’t allow me to mention details. I think this was worthwhile, and I took many of your comments on Twitter to the guys doing the redesign. We will see a much more efficient VMware site really soon that will save us all time!
The coolest thing I got to do in 2011 is join Gestalt IT, and attend Tech Field Day 7 in Austin. That was an amazing experience. I got to interact with amazingly smart, independent thinkers in the industry. I also saw some cool new products and ideas from Dell, SolarWinds, Symantec, and Veeam. I haven’t had much time to blog about these, but I do plan on evaluating a few of the products I saw, and posting my opinions as soon as time allows in 2012. I’m definitely looking forward to my next TFD event! I would encourage any of my readers who are not employed by a vendor to contact myself, or Stephen Foskett if you’d like to attend yourself! Stephen and Matt Simmons work very hard to make these events quite valuable for both presenters, and participants.
I’m sure I forgot to thank plenty of folks. Sorry.
Ohh yea. . . I nearly forgot one other thing. I also got to enjoy the birth of my second son in 2011. Amazing!
Happy New Year!
For a while, I’ve been looking for a way to pick which “slots” our VEM’s go into on the 1000V VSM. It would make troubleshooting much easier, and it just makes more sense to the networking guys who are used to working with physical line cards and supervisors.
A network escalation engineer over at VMware came through with a process for renumbering the VEM’s. It’s simple, but it never really occurred to me that it was this simple.
All you need to do is grab the host id of the VMware host from the VSM config, shut down the host to take the VEM offline, and then renumber it in the VSM config.
Here’s a screenshot @benperove sent over detailing the process. I’m definitely doing this ASAP on my 1000V’s! Thanks Ben!
Based on the comments, and the other posts that said there was no point in setting IOPS to 1 on Round Robin, I decided I was going to have to get more aggressive and test a wide variety of workloads on multiple hosts and datastores. My goal is to see if there would be any significant difference between Round Robin and PowerPath VE in a larger environment than I was testing with previously.
For Round 3 of my tests, I use 3 hosts, 9 Win2008 R2 VM’s, and 3 datastores. My hosts are HP BL460 G7 blades with HP CNA’s. All hosts are running ESXi 5 and are connected via passthrough modules to Cisco Nexus switches. FCoE is being used to the Nexus, and then FC from there to Cisco MDS’s, then to the VMAX. No Storage IO Control, DRS, or FAST is active on these hosts / LUN’s.
Here are the test VM’s, and their respective IOMeter setup:
The first test is Round Robin with the IOPS=1 setting. We’re seeing 20,673 IOPS with an average read latency of 7.69ms. Write latency is 7.5ms on this test. When we change all LUN’s back to the default of IOPS=1000, we see a significant drop in IOPS, and a 40% increase in latency. Since the bulk of my IOMeter profiles are sequential, this makes sense. EMC tests, as well as my own, show that there is little difference between IOPS=1 and IOPS=1000 when dealing with small block 100% random I/O.
When switching to PowerPath hosts, we see the IOPS increase around 6%. This is probably not statistically significant or anything, but what I did find interesting is the 15% better read latency. My guess is that PowerPath is dynamically tuning based on the workload profile from each host, where Round Robin is stuck at whatever I set as the IOPS= number.
Here’s the scorecard for Round 3:
To sum up our last round of comparisons, it was nice to see results using more hosts, datastores, and VM’s with varying I/O profiles. While this was helpful, no one can really simulate what real workloads are going to do in production, with IOMeter.
PowerPath for physical servers is a no-brainer. Based on my results, I am recommending the purchase of PowerPath VE for my VMware environment as well. In my opinion, it comes down to predictability, and peace of mind. I cannot predict what all workloads are going to look like in my environment for the future, and I am not willing to test and tune individual LUN’s with different Round Robin settings. I’d much rather leave that up to a piece of software.
Thanks for all the comments and ideas for these tests and posts.
Apparently there is a
bug feature in the 5548 / 5596 switch where they left out the default QoS policies. Those have to be in place for FCoE to work. So they’re shipping the switch with FCoE enabled, but this QoS policy is missing.
What results is once you get everything setup, you’ll see some FLOGI logins, but it’s very sporadic. The logins will come in and out of the fabric, and no FCoE will happen. Your FCoE adapters will report link down, even though the vfc’s are up, and the ethernet interfaces are up.
What I suspect is happening – and take this for what it’s worth from an expired CCNA – is the MTU isn’t set properly for the FCoE b/c the system QoS policies aren’t letting the switch know that there is FCoE. It wasn’t until I mentioned that I changed the default MTU that the Cisco TAC level 2 guy finally remembered this little QoS problem with the big switches.
But he sent me the article, so I’ll save you some time.
If you copy the code in blue and paste it, your links will come up instantly and you’ll be ready to roll. Here’s the link to the Cisco article.
The FCoE class-fcoe system class is not enabled in the QoS configuration.
For a Cisco Nexus 5548 switch, the FCoE class-fcoe system class is not enabled by default in the QoS configuration. Before enabling FCoE, you must include class-fcoe in each of the following policy types:
The FCoE class-fcoe system class is not enabled in the QoS configuration.
For a Cisco Nexus 5548 switch, the FCoE class-fcoe system class is not enabled by default in the QoS configuration. Before enabling FCoE, you must include class-fcoe in each of the following policy types:
The following is an example of a service policy that needs to be configured:F340.24.10-5548-1class-map type qos class-fcoeclass-map type queuing class-fcoematch qos-group 1class-map type queuing class-all-floodmatch qos-group 2class-map type queuing class-ip-multicastmatch qos-group 2class-map type network-qos class-fcoematch qos-group 1class-map type network-qos class-all-floodmatch qos-group 2class-map type network-qos class-ip-multicastmatch qos-group 2system qosservice-policy type qos input fcoe-default-in-policyservice-policy type queuing input fcoe-default-in-policyservice-policy type queuing output fcoe-default-out-policy service-policy type network-qos fcoe-default-nq-policy
As promised in the first post, here is round 2 of my testing with PowerPath VE and vSphere 5 NMP Round Robin on VMAX. For this round of testing, I changed the Round Robin iooperationslimit to 1, from the default of 1000.
I understand that this is not recommended, and I also understand that further testing is needed with multiple hosts, multiple VM’s and multiple LUN’s. As soon as I get the time, I’ll do that and report back.
For the background, and methodology, click the link above to read the first post. For now, I’ll skip right to the scorecard.
As we can see here, setting Round Robin IOPS to 1 definitely evens the score with PowerPath. I expected to see more CPU activity than PP, but that wasn’t the case. I also expect to see more overhead on the array once I add more hosts, VM’s and LUN’s to the mix. It might be a few weeks before I can pull that off.
Thanks for reading, and commenting. Round 3 to come.
This past year, I did an exhaustive analysis of potential candidates to replace an aging HP EVA infrastructure for storage. After narrowing the choices down, based on several factors, the one that had the best VMware integration, along with mainframe support was the EMC Symmetrix VMAX.
One of the best things about choosing VMAX in my mind was PowerPath. It can be argued whether PowerPath provides benefits, but most people I have talked to in the real world swear that PowerPath is brilliant. But let’s face it, it HAS to be brilliant to justify the cost per socket. Before tallying up all my sockets and asking someone to write a check, I needed to do my own due diligence. There aren’t many comprehensive PowerPath VE vs. Round Robin papers out there, so I needed to create my own.
My assumption was that I’d see a slight performance edge on PowerPath VE, but not enough to justify the cost. Part of this prejudice comes from hearing the other storage guys out there say there’s no need for vendor specific SATP / PSP’s since VMware NMP is so great these days. Here’s hoping there’s no massive check to write! By the way, if you prefer to skip the beautiful full color screen shots, go ahead and scroll down to the scorecard for the results.
Tale of the Tape
My test setup was as follows:
|Test Setup for PowerPath vs. Round Robin|
|2 – HP DL380G6 dual socket servers|
|2 – HP branded Qlogic 4Gbps HBA’s each server|
|2 – FC connections to a Cisco 9148 and then direct to VMAX|
|VMware ESXi 5 loaded on both servers|
|All tests were run on 15K FC disks – no other activity on the array or hosts|
Let’s Get It On!
(i’m sure there’s a royalty I will have to pay for saying that)
Host 1 has PowerPath VE 5.7 b173, and host 2 has Round Robin with the defaults. Each HBA has paths to 2 directors on 2 engines. I used IOmeter from a Windows 2008 VM with fairly standard testing setups. Results are from ESXTOP captures at 2 second intervals.
The first test I ran was 4k 100% read 0% random. All these are with 32 outstanding IO’s, unless otherwise specified.
Here is Round Robin
And PowerPath VE
First thing I noticed was that Round Robin looks exactly like my mind thought it would look. Not that that means anything. I do realize that this test could have been faster on RR with the IOPS set to 1, and maybe I’ll do that in Round 2. As for round 1, with more than twice the number of IOPS, PowerPath is earning its license fee here for sure.
How about writes? Here’s 4k 100% write 0% random.
Once again, PowerPath VE shows near 2x the IOPS and data transfer speeds. I’m starting to see a pattern emerge.
How about larger blocks? 32K 100% read 0% random.
PowerPath is really pulling ahead here with over 2x the IOPS yet again.
32K 100% write 0% random
Wow! PowerPath is killing it on writes! Maybe PP has some super-secret password to unlock some extra oomph from VMAX’s cache.
Nevertheless, it’s obvious that PP is beating up on the default Round Robin here, so let’s throw something tougher at them.
Here’s 4K 50% read 25% random with 4 outstanding IO’s.
The gap between the contenders closes a bit with this latest workload at only a 24% improvement for PP. But as we all know, IOPS doesn’t tell the entire story. What about latency?
4k 100% write 0% random
Write latency is 138% higher with Round Robin! That’s a pretty big gap. Is it meaningful? Depends on your workload I guess.
Scorecard after Round 1
So far, PowerPath looks like a necessity for folks running EMC arrays. I’m not sure how it would work on other arrays, but it really shines on the VMAX. In some of my tests the IOPS with PowerPath were three times greater than with the standard Round Robin configuration! I do believe that the gap will shrink if I drop the IOPS setting to 1, but I doubt it will shrink to anywhere near even. We will see.
In addition to the throughput and latency testing, I also did some failover tests. I’m going to save that for a later round. I don’t want this post to get too long.
Several months ago, a small firm I consult for ordered a Drobo Elite (recently replaced by the B800i). These guys had run ESXi for a while in one of their environments, and were wanting to explore some of the features requiring shared storage. Like most
small businesses, they wanted to get there without breaking the bank. There aren’t a ton of options in the $6-7k range for iSCSI arrays on the VCG, so it was an easy choice.
Their CIO called up Drobo and placed the order. He explained what they were going to use it for, and the guy configured it right over the phone and shipped it out. A few days later, the Drobo Elite arrived configured with 8 x 2TB Western Digital (WD20EARS) disks at a cost of just under $6k.
Setup in ESXi was straight forward. I followed the documentation from Drobo and set the PSP to VMW_PSP_MRU and SATP to VMW_SATP_DEFAULT_AA and started throwing VM’s on for testing.
The initial tests were okay. I wasn’t really bouncing around the room yet, but I am used to larger FC array speeds. Once I saw that IOmeter was pushing the expected number of IOPS, we were ready to throw on a few VM’s. For some context, we’re talking about a 100 person company with about 20 servers in total. They’re running 50% of those on ESXi right now on two hosts. Once normal daily production started with 3-4 VM’s hitting the Drobo, everything screeched to a grinding halt.
Latency, as reported in ESXTOP, was showing 4-5000ms, and there wasn’t any single workload that was giving it a tough time. I went back in and double checked the iSCSI config. All the bindings were correct, as were the PSP and SATP. Nothing had changed except adding a couple more VM’s to the Drobo.
I began to suspect the switch was misconfigured, so I pulled it out, and went direct to the Drobo. That didn’t really yield a noticeable improvement. After troubleshooting this forever, and deliberating on the phone with Drobo, they announced their verdict. Apparently the WD “Green” drives are not supported with VMware. They said we’d need to buy the Black drives.
Their site quickly confirmed. But again, since Drobo configured the unit, knowing it was for a 2 host VMware environment, we both assumed the Green drives were sufficient. The extra cost of the Black wasn’t warranted for this environment. I could understand if the customer had gone out and bought some random drives, but they came with the unit directly from Drobo.
They had us run some of their own IOmeter tests directly connected from a Windows box using the MS iSCSI initiator. We then went ahead and swapped the disks for the recommended WD Black disks, and below are a few charts showing the results.
The Black is faster in every way, but the most noticeable aspect is write latency. I suspect this is due to the increased processing power and faster cache. Nevertheless, the results speak for themselves.
Bottom line is, if you’re going to run ESXi on a Drobo, don’t go green!
BREAKING – PALO ALTO (VP)
The VMware licensing debate was killed this afternoon while trying to rescue the #vTax hashtag from the inside lane of the Ridiculous Interstate. Witnesses say a bearded, balding “smart-looking” man was driving north at a very high rate of speed in a truck with the license plate VMW when the debate was struck. The truck backed up and struck the debate again and again before authorities arrived and pronounced the debate dead at 5 PM PDT today.
I am writing this as a blogger at Virtual Insanity, and a customer of VMware. I don’t sell VMware, and I’ve never worked for VMware. I don’t even work for a partner. I barely get to chat with my fellow bloggers who work for VMware, and am certainly not privy to inside information, despite my company’s NDA.
With that out of the way, VMware has done the right thing here. The fact that they can take customer feedback and mold it into a dramatic licensing change, just a few weeks before a product GA’s, is astounding. That speaks not only to the agility of the company, but their willingness to please their customers.
They even went out of their way to please NON-PAYING customers with this change. The change to the free version was causing more drama than the change to customers who spend millions with VMware.
Should VMware have focus-grouped the licensing change more than they did? Yes. It would have preempted the customer perception wildfire they have had to fight for the past couple of weeks. I am sure they ran the numbers and knew that only a small percentage would be impacted. But the fact is an even smaller percentage actually ran the scripts to see how it would affect them. Once the fervor got started by a few, it wasn’t going to stop.
A price increase was inevitable. VMware has given us HUNDREDS of new features in the past several years for free. I think not increasing it with 4.0 was the right move, but they couldn’t hold out forever. The new vRAM allotments and policies are spot on, and are going to put a lot of customers’ fears to bed.
Now we can get on with discussing the amazing new features of vSphere 5.0 without that licensing cloud hanging over our heads.
Recently I have been researching HP C7000 chassis connectivity options extensively. Prior to diving deep into it, Virtual Connect FlexFabric seemed like a no brainer. On the surface, it has many advantages.
The cabling / port reduction is an obvious win, as is the ability to have some control over WWID and MAC assignment to blades. Moving East / West traffic between chassis without having to go Northbound to a ToR or EoR switch is attractive as well. Of course these are all things that are just standard with UCS, but I digress.
After many meetings with HP, I still had some questions that were unanswered. I turned to the many thousands of pages of HP documentation on the subject. Sifting through all the “cookbooks” and the in-depth guides to Virtual Connect, and talking with some current users of FlexFabric, I came to the conclusion that it is missing some key features that are needed in a VMware environment. In fact, I would say that for Cisco shops running VMware, HP FlexFabric makes little sense.
The biggest problem I have with Virtual Connect FlexFabric is the lack of any real QoS. Once traffic enters the Virtual Connect module, it’s anarchy. There are no controls in there for prioritization or control of bandwidth. In a VMware environment, where there will be multiple types of traffic, each capable of generating significant load, the only control you have on VC is egress rate limiting.
It’s akin to limiting the number of people one can put in a single car, right before driving through the middle of Rome.
For those who haven’t had that experience, trust me, it’s the same type of anarchy that occurs inside VC. The only rule is try not to die.
Here’s a nice diagram showing Virtual Connect and VMware traffic flow design from M. Sean McGee’s blog:
When you have a Cisco 1000V on the ESXi host and a Nexus 5K on the other end, it makes little sense, in my opinion, to completely break awesome features like Priority Flow Control and Bandwidth Management. HP states that they do support FCoE and DCB (CEE), which should include the above features, but their own guys cannot really say how one would configure, or troubleshoot it. That’s part of the problem. VC is a black box that abstracts your ability to see what is going on inside.
One of my other negatives for VC FlexFabric is that I have no choice but to split my 10GbE pipe into smaller pipes if I want to run an HBA off the adapter. If I use the exact same onboard CNA without FlexFabric, I don’t have to do that. This can be solved with separate HBA’s, or 10Gb NIC’s, but that negates the alleged cost savings. So now I’m forced to try and guess how much bandwidth I need for each traffic class, when I already own switching infrastructure that is smart enough to do that for me.
In my opinion, this is akin to disabling DRS. DRS is smarter than you, and faster. Why would anyone disable it? Cisco QoS is certainly smarter than me, as is VMware NetIOC. So why would I want to throw some arbitrary limits on my huge pipe? VMware admins understand that shares are better than reservations or limits. The reasoning is the same on the networking side.
There are other problems I see with this solution, but I don’t want to bore you. One complaint I have heard from close associates is the HP recommended method of “stacking” VC modules is problematic. Not only do you have to give up 3 of the 8 ports per module for stacking, but it can create bandwidth issues as well. Recently, a friend of mine had to completely revamp his setup to uplink everything, as opposed to stacking, which was allegedly causing bandwidth problems in his environment. Ohh, and in addition to all this, the FlexFabric module will take FCoE and pass it North as standard Ethernet. So you lose any of the FCoE features provided by your Nexus switch.
Companies that are not virtualizing certain applications, but will run them on blades, may find that the advantages of moving around MAC and WWID’s outweigh the potential disadvantages of FlexFabric. Everything on my blades will be ESXi, so I don’t really have a need for quick physical ID recovery.
As of right now, I plan to use passthrough modules on the C7000’s. At least until a better alternative comes out. Passthrough is slightly more expensive on the uplink port side, but it doesn’t prevent my networking team from having end to end visibility and management. And that takes some of the guesswork, and the administration off of my team, which is a good thing! I would be interested to hear your experiences in the comments below.
Cisco decided to shut down Flip last month. Why? Because it’s a low margin business that Cisco has no business owning. There is talk about killing Linksys, or spinning it out. Why? Low margins and it doesn’t jive with Cisco’s core competency. UCS (datacenter unified computing system) is another product that has very low margins, and really should be sold if Cisco is to remain as strong as it has been over the past two decades.
I find it interesting that only a year ago, all the industry pundits were talking up Cisco and their stock was riding high. How quickly the sands have shifted under their feet. Shareholders and industry experts are calling for Chambers to resign, and some have even suggested they get rid of UCS. Last week’s Infosmack featured some interesting commentary on Cisco selling UCS. GigaOm thinks Cisco has lost that lovin’ feeling for VCE. They seem to be investing as heavily as EMC, but they get a much, much smaller piece of the pie on all those sales. And let’s face it, VCE sales are expensive. Maybe Cisco should have bought EMC when they had the chance?
I thought Robin Harris’ comment over on Storagemojo was profound:
UCS lowers Cisco’s margins; enrages large resellers; and has no sustainable competitive advantage. Cisco can’t wish those facts away, and the stock market won’t forget them either.
The sustainable competitive advantage thing is a big one.
Even with the latest IDC report showing that UCS has overtaken Dell to become the #3 blade player, there is still plenty of uncertainty in the market. I can say from my own experience that executives, who admittedly know very little about UCS and what it brings to the table, are shying away from it out of fear that Cisco could exit the server business.
From the very beginning, there was talk of Cisco not being “serious” about becoming a server vendor. Add the recent stock troubles, and decision makers are less willing to stick their necks out on millions worth of UCS. After all, nobody has ever been fired for buying IBM.
Companies often take a bath when they get into areas that go against their long standing value propositions. BMW lost $4 Billion when it sold Rover to Ford for $1. Cisco spent $600 Million on Flip only 2 years ago. The fact that Cisco first approached IBM and HP with the UCS idea, and was rejected only proves that Cisco knew it didn’t want to be in the server business before it . . .got into the server business. Perhaps now that they have made their point, one of the server vendors will be interested in a UCS purchase.
With HP getting amazingly aggressive on pricing of their network offerings, and Juniper introducing QFabric, Cisco’s attention needs to be focused on their core competency if they wish to maintain those luxuriously high margins into the future.
I am sure it comes as no surprise to any of our readers that virtualization is not the exclusive full-time focus for most of us. Most of us have a breadth of responsibility spanning gobs of infrastructure layers in our respective organizations. One common pain point that most of us have is backups.
For many companies, backup is an afterthought. It doesn’t contribute to the profitability of the company. It doesn’t help you make more widgets in the same amount of time. The result often times is a neglected backup system when it comes to budgets and spending. Most of the time, even though we know the importance of backups, we’re okay with it taking a back seat. After all, who wants to goof around with tape drives when there are cool new blades and SSD storage to play with?
It was this frame of mind that I found myself in on Tuesday of this week. I had signed up for W. Curtis Preston’s Backup Central Live a while back on Stephen Foskett’s recommendation. I knew it would be decent, as I had used Backupcentral.com for a long time as a valuable resource to help deal with those dreaded backup problems. But when Monday came, I found myself wondering why the heck I signed up for this seminar. I had so much work to do this week, and most of it was fun SAN and VMware planning and design stuff. I didn’t have time for baaaaackups. . . Grrrr.
In the end, my boss was pumped about the seminar. I knew I couldn’t back out without getting grief, so reluctantly, I made the 1.5 hour drive to Cary, NC for a full day of backups. I knew Curtis would be a great speaker, and have good insight. I have heard him many times on Infosmack, and I know from his blog posts that he knows his stuff. I just wasn’t looking forward to a full day of vendor pitches between the valuable information.
Ultimately, I was impressed with the event, and it was far from a waste of time. Even the vendor presentations were decent, and they kept to a reasonable time limit, so the pace was perfect. I’ll give you a quick rundown of what I learned at this event.
Often times we feel alone in our backup struggles. At the seminar, there was wireless polling during the presentation, so we had real time answers to our questions. That alone was a fantastic change; and I prefer this to raising my hand 48 times during a session. From this polling data, I learned that I am not alone. Many share in my misery.
- 49% of attendees still do backups DIRECT TO TAPE.
So while us 49% think that no one hears our screams, at least now we know that we’re not the only ones screaming. I think we all know that tape is not a suitable target for server backups. The problem only gets worse as tape drives get faster. Disk, at least as a staging area, is a necessity now for reliable backup to tape.
That said, Preston points out that tape is a long way from being displaced from the datacenter. Tape is still 50x cheaper than disk, and more reliable for long-term data storage. One fact I found enlightening was that hard disks are not designed or tested to store data long term while powered off. This is something I had never thought about, and only a couple of companies, like ProStor, are trying to solve this problem. Even if we solve for the reliability difference, it will likely be decades before we see a significant degree of cost parity (if ever).
A speaker from Cambridge Computer Services talked about new cool ways people are using tape as part of a tiered strategy for primary data. Some are even using tape as a mirror for their primary storage. Of course this requires a gateway appliance with plenty of cache, and good software, but the savings are real.
Another crucial area we touched on was that of archival, especially as it relates to electronic discovery (ED). Almost NO ONE is doing this. The vast majority is using their primary backup software and methodology for archival. This is an expensive mistake if you ever are called upon to do discovery. In addition to my own experience with ED, Preston tells a story of a client who spent millions to satisfy a single discovery request.
Apparently a single user’s e-mail for the past three years was requested. As they were only doing normal Exchange backups, that meant restoring 156 different monthly Exchange backups, and then fishing for this guy’s mails. It took an army of consultants working three shifts MONTHS to do this. Since we live in a litigious world these days, it might be a good idea to get your ED and archival in order. One product that was recommended at the seminar was Index Engines. I haven’t had time to look at it yet, but it sounds brilliant!
One interesting statistic we saw in the polling data was that the majority of attendees had an overblown opinion of themselves when it comes to their own backup environments. The majority said their backups ran well. Preston’s experience tells quite a different story. The scary part of this is that people don’t know that their backups suck. They find out when it’s too late.
The most valuable part of this seminar was the discussion time at the end. There was many interesting discussion around cloud backups, AWS outage, and snapshots. This brought everything together that we had learned during the day.
There isn’t space in a single blog post to cover all the material from a full day seminar, but I hope I’ve given you enough to help make a decision to check this event out when it comes your way. I have to give it to the Backup Central Live crew for taking a topic that most people hate, and turning it into a valuable day of learning.
A couple months ago after the introduction of the new EMC VNX arrays, I posted my thoughts on it here. One of the engineering choices I questioned was the use of SSD’s for extending cache versus a PCI card. It was always obvious why it would be better when cache was being added or replaced, but I questioned the throughput potential of a SAS interface versus a PCI one.
I got some interesting feedback on that from several people, and I appreciate it. It wasn’t until the other day that I realized that the argument really did not have that much merit. In a moment of blinding brilliance, I realized that the only time this might make a difference is when warming the cache.
How did I come to this realization? I was in a VNX deep dive session presented by Chad Sakac, and I had every intention of asking him the question of PCI versus SAS when it comes to cache. Lucky for me, he brings it up during the session, before I could ask. Before he was done with the rest of the presentation, I realized an error in my prior way of thinking.
Chad pointed out that the time it takes for an IO to go through the controller, loops, and hit the flash is measured in nanoseconds (10-9). Once it’s there, the flash has latencies in the microseconds (10-6). So there is not likely to be a significant difference in latency between SSD, and PCI when it comes to cache.
PCI obviously has greater throughput potential, which is why I previously asked the question. But a realization jumped up and bit me while I was sitting through this presentation. Cache IO’s are usually small chunks of data that benefit from the reduced latency of flash / DRAM. They aren’t giant read / write operations that generally require extremely wide bandwidth. Will the increased bandwidth of PCI make a difference? I have my doubts that it will be noticeable on the vast majority of workloads. But this is just my opinion, as an outsider without the benefit of a storage engineering background.
I am looking forward to seeing the SPC1 benchmarks from the VNX. I believe it will objectively tell the whole truth. A slight difference on an anomalous workload is not significant enough to outweigh the benefits of SSD versus a PCI cache. It’s easily swapped, and it’s non-volatile. It only needs to be warmed once. If a controller fails, the cache doesn’t die with it. Replace a controller, and no need to rewarm cache.
Like I was alluding to in my last post on this. . .every design decision, whether it is in storage engineering, vSphere design, automobile design, is one of compromise. The SPC1 will tell the whole story, but I think what we’ll see here is that this particular compromise was overall a good one. What do you think? Let me know in the comments.
I’ve been running some tests in the lab lately, and trying to solve a problem that I don’t think is solvable right now. I’m hoping some of our readers will point out a potential solution that I have missed. Kendrick Coleman posted a write-up of how VM performance can be impacted by VM placement within the cluster. This is almost exactly what I have been testing in my lab, with a few twists.
As Kendrick points out, VM’s that need to communicate with one another regularly are better off on the same ESXi host. With VMXNET 3 NIC’s, one can achieve massive throughput between VM’s on the same host. However, that is not always the case.
The issue I am running into as I design my production environment is a requirement to have everything segmented off into hundreds of VLAN’s. This means that there will be servers on the same host that are on different VLAN’s that will need to communicate, sometimes frequently. This completely negates the benefit of having the VM’s on the same host, as the traffic will have to leave the box to be routed.
Here are some tests I did using iperf from the VM Advanced ISO v0.2 just to further expand on the idea:
2 VM’s on same host / same VLAN
2 VM’s on same host / different VLAN
2VM’s on different hosts / same VLAN
2 VM’s on different hosts / different VLAN
As you can see, it makes almost no difference that VM’s are on the same host, when the VLAN’s / subnets are different. Just for fun, I bumped the TCP window size, and was able to achieve 3.5Gbps from VM to VM on the same host, and the same VLAN. When the VLAN is changed, the ratio of slowdown is the same regardless of host affinity. This is because traffic is leaving the host, going all the way to my Cisco 6509, and coming back into the same host.
Just for reference, all the hosts in these examples are connected to the same Cisco 1GB switch.
I brought this up with Cisco when they were in talking about UCS. They were mentioning their roadmap, and virtual appliances, so I thought it was a good time to ask whether there would be a virtual layer 3 appliance in Cisco’s roadmap. The response was about what I expected.
Even if I go UCS, which brilliantly handles east / west traffic across multiple chassis, the top of rack 61xx Cisco devices don’t route, so I’ll still have to go all the way out to a 5000, or soon a 7000, to get routed back into the same host on the same wire.
Talking with a few friends who know more Cisco than I do, we discussed the idea of a virtual router. The inherent problem with a virtual router in this environment is the VM is still bound by its default gateway. When DRS runs, and moves VM’s around, now a VM is living on a host that does not have the particular virtual router with his gateway interface. That defeats the purpose.
We talked about how it might be possible to work around this using Cisco’s Gateway Load Balancing Protocol (GLBP), but even then, you’d have to set preferred active paths, and it wouldn’t always work the way we need it to work.
The only solution to this issue I can think of is a Distributed Virtual Router, which doesn’t exist. If someone could make a virtual router that operates like the Distributed Virtual Switch, it would help all us people out here in the financial world who are ever more constrained by tons of VLAN’s, and (virtual) firewalls in between those VLAN’s.
Is there a need for this in the marketplace? Or am I making a bigger issue out of this than I should?
As always, your comments are appreciated.
I have run the entire gamut of virtual SAN appliances so far in my VMware lab environment, and I always come back to the Celerra UBER VSA. The best thing about this is that it’s free, and there’s no expiration. The LeftHand is easier to use, and has some neat clustering features, but it’s a 60 day license.
I don’t know about you, but there are times when I get busy, and don’t get a chance to touch the lab for weeks. 60 days is just not enough. I got the NetApp ONTAP 8.0 appliance as well, but it’s a pain to use unless you’re running VMware Server (who still runs that?) or Workstation.
Anyway, I’ve been struggling with the performance of the UBER VSA, and trying hard to find a way to make it faster. Thanks to Clinton Kitson, I was able to dramatically improve throughput and latency on my Celerra VSA. Time to deploy a VM from a template went from 30, down to 9 minutes.
Clinton says they may try to include this in the next UBER version, as long as there is consensus that it is beneficial and does no harm.
Here’s the tweak:
Login to the VSA using the root account. Password is nasadmin.
Type in this command: /sbin/swapoff -a
Then, go in and edit the sysctl.conf file using this command:
Add the following lines to the end of the file:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Ctrl+X to exit out. Save the file over the existing sysctl.conf.
Restart your VSA.
The screenshot below shows my VSA. I had already changed the caching settings, but here’s what happens by simply turning off the swap. You can clearly see the huge improvement in both latency and throughput.
Basically what we’re doing with these commands is telling the VSA not to swap. We’re also changing the way the underlying RedHat OS caches data before it writes to disk. Be aware that this does increase the risk of data loss, as we’re caching much more data in RAM before it’s written to disk. If data loss is a concern in your lab, you may want to stick with the standard settings. Also, my VSA has 6GB of RAM allocated. Still only a single data mover. Obviously more RAM = more performance when we’re turning up the caching.
Thanks to Clinton for pointing out these settings. It’s hard to find performance information on the EMC VSA. I hope this post helps you get more work done in your lab environment.