Recently I have been researching HP C7000 chassis connectivity options extensively. Prior to diving deep into it, Virtual Connect FlexFabric seemed like a no brainer. On the surface, it has many advantages.
The cabling / port reduction is an obvious win, as is the ability to have some control over WWID and MAC assignment to blades. Moving East / West traffic between chassis without having to go Northbound to a ToR or EoR switch is attractive as well. Of course these are all things that are just standard with UCS, but I digress.
After many meetings with HP, I still had some questions that were unanswered. I turned to the many thousands of pages of HP documentation on the subject. Sifting through all the “cookbooks” and the in-depth guides to Virtual Connect, and talking with some current users of FlexFabric, I came to the conclusion that it is missing some key features that are needed in a VMware environment. In fact, I would say that for Cisco shops running VMware, HP FlexFabric makes little sense.
The biggest problem I have with Virtual Connect FlexFabric is the lack of any real QoS. Once traffic enters the Virtual Connect module, it’s anarchy. There are no controls in there for prioritization or control of bandwidth. In a VMware environment, where there will be multiple types of traffic, each capable of generating significant load, the only control you have on VC is egress rate limiting.
It’s akin to limiting the number of people one can put in a single car, right before driving through the middle of Rome.
For those who haven’t had that experience, trust me, it’s the same type of anarchy that occurs inside VC. The only rule is try not to die.
Here’s a nice diagram showing Virtual Connect and VMware traffic flow design from M. Sean McGee’s blog:
When you have a Cisco 1000V on the ESXi host and a Nexus 5K on the other end, it makes little sense, in my opinion, to completely break awesome features like Priority Flow Control and Bandwidth Management. HP states that they do support FCoE and DCB (CEE), which should include the above features, but their own guys cannot really say how one would configure, or troubleshoot it. That’s part of the problem. VC is a black box that abstracts your ability to see what is going on inside.
One of my other negatives for VC FlexFabric is that I have no choice but to split my 10GbE pipe into smaller pipes if I want to run an HBA off the adapter. If I use the exact same onboard CNA without FlexFabric, I don’t have to do that. This can be solved with separate HBA’s, or 10Gb NIC’s, but that negates the alleged cost savings. So now I’m forced to try and guess how much bandwidth I need for each traffic class, when I already own switching infrastructure that is smart enough to do that for me.
In my opinion, this is akin to disabling DRS. DRS is smarter than you, and faster. Why would anyone disable it? Cisco QoS is certainly smarter than me, as is VMware NetIOC. So why would I want to throw some arbitrary limits on my huge pipe? VMware admins understand that shares are better than reservations or limits. The reasoning is the same on the networking side.
There are other problems I see with this solution, but I don’t want to bore you. One complaint I have heard from close associates is the HP recommended method of “stacking” VC modules is problematic. Not only do you have to give up 3 of the 8 ports per module for stacking, but it can create bandwidth issues as well. Recently, a friend of mine had to completely revamp his setup to uplink everything, as opposed to stacking, which was allegedly causing bandwidth problems in his environment. Ohh, and in addition to all this, the FlexFabric module will take FCoE and pass it North as standard Ethernet. So you lose any of the FCoE features provided by your Nexus switch.
Companies that are not virtualizing certain applications, but will run them on blades, may find that the advantages of moving around MAC and WWID’s outweigh the potential disadvantages of FlexFabric. Everything on my blades will be ESXi, so I don’t really have a need for quick physical ID recovery.
As of right now, I plan to use passthrough modules on the C7000’s. At least until a better alternative comes out. Passthrough is slightly more expensive on the uplink port side, but it doesn’t prevent my networking team from having end to end visibility and management. And that takes some of the guesswork, and the administration off of my team, which is a good thing! I would be interested to hear your experiences in the comments below.