HP Emulex CNA Firmware Problems on BL460G7

If you have HP BL 460 G7′s with the on-board 10GB CNA, you’re going to want to read this post regarding a problem with the latest firmware.

This was first noticed this issue when updating firmware to troubleshoot an issue where the storage doesn’t come back up after rebooting an upstream Nexus switch.

The symptoms are: the NIC comes back up, and the vfc is up, but all storage paths on that side of the fabric are still dead in ESXi 5.0.  To fix this issue, the vfc or port channel  must be shut /no shut.

I also saw an issue where the storage paths were dead, and the NIC never came back up.  A reset of the Ethernet port will not fix this.  A reboot of the ESXi host is required.  Pay attention to the NIC state if you lose storage paths in this configuration with FCoE.

As part of my troubleshooting, I went to update the firmware on the CNA.  The latest version of the firmware from HP is 4.0.360.15a.  When updating using the Emulex utility, on about 20% of my blades, I got a CRC error during the upgrade process.  Below is a screenshot of this error.

 
After retrying the firmware update, as stated in the utility, the same error occurred.
This is where you need to pay attention!! 

During the POST process, the blade WILL report the correct firmware.

 

Since the firmware version is correct, one might assume the update was indeed successful.  That’s a bad assumption.  Upon further testing, we found the blades that failed the firmware update were the ones failing during the switch reloads.

There were only 2 blades that did NOT fail the firmware update, but still failed the switch reload process.  They were replaced, and now I have no blades failing to reacquire storage paths after an upstream switch failure.

I must point out that HP has been unusually proactive with this issue, which is a nice change!  I still have several blades in another datacenter that are not taking the firmware update.  When I scheduled to have those all replaced, HP got some of their top people on it and scheduled a call.  I tested their proposed fix this morning, which didn’t work.

They are actively working on a fix, so you won’t have to replace your blades.  I will update this post as soon as I get word back from them on that fix.  Meanwhile, if you’ve seen this, you might want to schedule some switch reloads during a maintenance window to make sure you are good to go.

Update 2/6:

As of today, there is no fix that I’m aware of. . . HP replaced the remaining blades after we tried a couple more proposed fixes.  If I get word of a fix, I will post it here.

 

 

 

 

 

 

 

 

Sign up for the VMware vNews Newsletter!

 

2012 ushers in some great new changes from the field technical team at VMware.  I  am merging the Ohio Valley Newsletter with the Wisconsin based field newsletter (aka vNews) in an effort to make it more all encompassing.  This content is designed to inform our customers of important updates from VMware from a technical perspective.  It also highlights some great public blog posts that might have snuck by you while you weren’t looking.  We will be moving away from the older legacy.pdf based version of the newsletter to a modernized delivery method, “SlideRocket”.  Here is the link to the first addition! 

 

vNews

Please make sure you subscribe  to the newsletter if you wish to receive these monthly newsletters in your inbox.  As always feedback is welcome and will help shape the content for future issues of the vNews!  Special thanks to Ben Sier, Vitaly Tsipris and Jeff Whitman for their contributions and driving to pull this off.  Let us know your thoughts!

-Scott

Infrastructure Deep Dive on EM4J with VMware vSphere

 

em4j

Introduction

Virtualizing and running Java workloads on vSphere is absolutely a reality, but when I talk to customers I emphasize the same best practices as virtualizing Tier 1 workloads.  The rules are not the same as basic consolidation and containment and you need to understand, plan, and architect your virtualization platform if you want to be successful. 

I spend much of my time working with customer infrastructure engineers and architects, and when topics of Java come up, the conversation takes a turn.  The infrastructure teams typically don’t want to get into the application stack and I can’t say that I blame them.  Java and programming are a completely different skillset and the infrastructure engineers already have enough full time jobs keeping the datacenter running.  The purpose of the blog post is to help shed some light on a new technology in vSphere 5 called “Elastic Memory for Java” or EM4J and hopefully some other simple Java best practices and information as well.  The end state of this blog is to help you bring up an EM4J configuration of your own so you can begin to see the value and test your own JVM configurations.  I am also writing this to help educate some of the infrastructure engineers and help explain why this feature matters (Disclaimer = I am not a Java programming guy).

 

What is EM4J?

Hopefully you are somewhat familiar with the intelligent memory management features that come with the vSphere platform such as memory ballooning.  Ballooning is a great technique that allows you to reclaim memory from virtual machines if it’s not in use by the VM.  When dealing with Java workloads a VMware best practice has always been to set reservations for the virtual machine.  This means we are always guaranteeing (or backing) that the memory will be available to the VM when it needs it.  When a memory reservation is set for a VM the hypervisor won’t reclaim memory from this VM (which means VM’s memory won’t be ballooned, compressed or swapped to persistent storage) if memory is tight on the host.

If you consider the definition of JVM (Java Virtual Machine) the last two words are important to consider when talking VMware virtualization.  Running a VM on a VM creates somewhat of a problem for the hypervisor.  The JVM is essentially a black box to the hypervisor and it has no visibility into what’s going on inside it’s environment.  EM4J on the other hand allows one to reclaim memory through a much cheaper mechanism, and induces GCs at the moments when VM is handling relatively low load. It does not eliminate long pauses as VMs without full reservations can end up swapping, but it significantly reduces pause time and provides a more graceful performance degradation when running overcommitted, making workload’s performance more predictable.  Now that I have described some of the characteristics, here is the actual definition according to the VMware documentation:

 

“Elastic Memory for Java (EM4J) manages a memory balloon that sits directly in the Java heap and works with new memory reclamation capabilities introduced in ESXi 5.0. EM4J works with the hypervisor to communicate system-wide memory pressure directly into the Java heap, forcing Java to clean up proactively and return memory at the most appropriate times—when it is least active. You no longer have to be so conservative with your heap sizing because unused heap memory is no longer wasted on uncollected garbage objects. And you no longer have to give Java 100% of the memory that it needs; EM4J ensures that memory is used more efficiently, without risking sudden and unpredictable performance problems.”

 

As you can see VMware is taking the same underlying technology that has been used for years across our customer base and applying it to Java workloads to gain more/better efficiencies at scale.  The same performance characteristics apply to EM4J as they do to the ballooning in the VMware ESX hypervisor.  Ballooning will only be invoked if the system is over committing memory, and has to begin utilizing its advanced memory management techniques.  The benefit of EM4J is when the host is under memory pressure, the end user experience will be the same as if the VM was hard backed with physical RAM as we discussed earlier.

 

bean

Getting started

EM4J is a product that works in conjunction with vSphere 5 and vFabric tc Server that is bundled with vFabric Standard and Advanced.  EM4J can also work directly with Apache Tomcat.  You might be asking yourself what is vFabric tc Server at this point and why the hell do I care about that?    vFabric tc Server is a Java application server based on Apache Tomcat that VMware maintains and supports.  This is a competitive product to a IBM WebSphere or an Oracle WebLogic, but is a much lighter weight Java container that allows faster deployments in development as well as production environments.   As a systems infrastructure engineer it is imperative that you understand these types of Java workloads from a high level.  Your success in moving these workloads into a virtual infrastructure depends on it and is irrelevant to EM4J.  Before I jump in and show you how to set this up there are a few things we need to get out of the way first.  Here is what your going to need to begin utilizing EM4J for your own testing, grab it now:

Making it work in vSphere

As noted in my disclaimer above, I am not a Java guy so this took me some time to get my lab environment up and running with the right components since I am new to vFabric.  RHEL is the officially supported operating system today, but Linux is Linux so I chose to grab the latest Ubuntu 11 distribution for my testing.  Work with your internal Java guru to get vFabric tc Server setup and running on your Linux VM for testing.  Once you get through setting up and installing your operating system and vFabric tc Server, there are some technical pre-requisites you need to accomplish in order to enable EM4J balloon driver and gain visibility into the JVM itself.

The first step you need to perform in your testing is to enable an advanced parameter within the Linux VM your are testing with.  The virtual machine will need to be powered down to perform this action.  Right click on the virtual machine, select edit settings, and the select the options tab.  Go down to the advanced section and select “General” and then select the “Configuration Parameters” button that is now visible:

 

advanced

Once you select the “Configuration Parameters” button you are going to select the “Add Row” button and add the following configuration parameter to the VM:

sched.mem.pshare.guestHintsSyncEnable and set the value to “true” as shown below:

 

schedmem

Making it work in tc Server

Once you have enabled the virtual machine for EM4J, you also need to ensure your instance of tc server utilizes the EM4Jbaloon driver.  Execute the command listed below to create a new instance, in this example my instance name is “scott” and the “elastic memory” option is what enabled the EM4J balloon driver.  Once you have created the instance, go ahead and start it up!

 

new_em4j_instance

start-scott

Next we will configure a few parameters within out instance so we can  monitor them via the VMware vSphere web console interface which I will show you next.  Add the following parameters to the setenv.sh file of your new instance name as follows:

 

JVM_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=6969
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false"

 

modify_params

Next step we need to setup what is called the Console Guest Collector (CGC).  The CGC is a process that allows the vSphere web console to pull data from the EM4J balloon driver and place it with each VM so the web client can then display performance data about the current workloads.  This needs to be setup via a cron so we can continuously pull real-time data into vSphere.  The cgc.sh script can be found in the /opt/vmware/vfabric-tc-server-standard-2.6.0.RELEASE/templates/elastic-memory/bin/ directory.  Here is a command to add an entry to the crontab for every 5 minutes:

*/5 * * * * /opt/vmware/vfabric-tc-server-standard-2.6.1.RELEASE/templates/elastic-memory/bin/cgc.sh >
/dev/null 2>&1

 

Making it work in the vSphere Web Client

You downloaded the EM4J UI plug-in earlier and now we need to extract it and set it up on your vSphere 5 Virtual Center server.  Extract the contents of the following directory then re-start the vSphere Web Client Service:

C:\Program Files\VMware\Infrastructure\vSphere Web Client\plugin-packages\em4j-client

 

em4j-dir

 

The data!

Now that we are through the tedious stuff we can actually see some of the more interesting performance data, and frankly the reason you are probably reading this blog post!  Log-in to your Virtual Center’s web interface and navigate to your virtual machine you are using to test with.  Select the fourth tab at the top of the options section which is titles “Workloads”.  You should now see something similar to this and the EM4J Agent Enabled should be selected if you setup everything correctly:

 

web_em4j1

 

Selecting the “Alerts” tab will give you any relevant data and tell you if any issues are occurring.  This will also display some Java Best Practices and instruct you on how to fine tune your JVM.  Selecting the “Resource Management” tab will display much more performance centric detailed information which gives you full visibility into the JVM itself.  Excellent performance visibility into that problematic Java workload:

web_em4j2

 

web_em4j3

web_em4j4

Conclusion

From the documentation, “EM4J helps the system behave gracefully and predictably when memory becomes scarce. It helps you to more easily determine the over-commit ratio that provides acceptable performance at peak loads.”  Hopefully you learned a little bit about what Elastic Memory for Java is and how it works within vFabric and VMware vSphere 5.  As with most technology features and functionality I suggest understanding the best use cases for EM4J and how it fits into your own environment.  The documentation that I linked to, gives plenty of examples of when EM4J should be utilized effectively.  Look for more performance benchmarks around optimal overcommit ratios as our vFabric team completes some great performance testing on this exciting new technology.  The EM4J architecture will not only allow you to run your JVM’s more efficiently, but will also provide you some great performance visibility and give insight into your Java workloads.