ESX

UPDATE: Connections and Ports in ESX and ESXi

Mr. Dudley Smith has updated his PDF diagram with some minor corrections and additions.  Get the latest, most up-to-date version here (click the graphic) …

connections-ports-esx-v3

He also updated “the brain” which can be found at it’s new home http://webbrain.com/brainpage/brain/89EFA582-2C35-F6A2-9ED1-7AD4810266C2/.  Make sure you update your bookmark accordingly.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Notes from VMware (aka, Mr. Michael White’s Newsletter)

I wish I could take credit for the following work, but everything below is brought to you by Michael White.  Michael is a co-worker of mine, an SE out of Canada who we often refer to as the “SRM King.”  He continually impresses me with his ability to crank out a weekly news letter loaded full of great content.  Well last night, he happened to mention I could republish his work on my blog.  Shoot, you don’t have ask me twice!

Keep in mind as you’re reading, everything is a direct cut and paste.  So anything written in the first person (e.g. “I have found …” or “I have decided”) would be referring to him, not me.  I certainly don’t want to take credit for all his hard work! :)  

If you have any questions or comments for Michael, feel free to leave a message for him.

 

Notes from VMware:

Cluster BP, FT and Issue, HA Issue, vDS Cheat Sheet, vDR Issue, YAPOTAV, vSphere Reference Card, View Design BP, SRM FAQ, and really a LOT more!

 

vSphere Cluster – ESX or ESXi or Mixed – suggestion / recommended best practice

We say that one day that ESX will not exist, and that ESX and ESXi are the same.  Or almost the same.  However, I have found in Host Profiles and FT there is very good reason to not mix ESX and ESXi in the same cluster.  As soon as VMworld is over, I am redoing my mixed cluster to all ESXi (instead of mixed).  First, we all know of the problem I reported some time ago that the 8/6/09 patches for vSphere would break FT in a mixed ESX / ESXi cluster.  There is no short term solution for that. The workaround is to have a cluster that is all ESX or all ESXi.  Second, host profiles have a problem dealing with service console / management network ports.  In theory you can manage that by using a reference server that is ESX and it will translate as necessary for ESXi.  It doesn’t do so well at that.  So using Host Profiles to do a push of a distributed virtual switch (only) ends up causing issues in ESXi consoles.  I ended up doing the ESXi hosts manually.  The real solution to the FT and HP type issues is to have a cluster all ESX or ESXi.  And I am voting for ESXi in my lab.  Make no mistake, if you don’t listen to this you will have some issues that are not pleasant.

 

Using ESXi and ESX and FT in same cluster?  And FT broke with the 8/6/09 patches?

The only solution to this at this time is to separate your ESXi and ESX servers into their own cluster, or upgrade one or the other to be the same as the other – meaning all ESXi or ESX and your problem should go away.  If you have not installed the 8/6/09 patches yet, and you are using FT, and you have ESXi and ESX in your cluster than either change your cluster to be all ESXi or ESX and than install the patches.  Not installing the patches until we fix this is NOT an option.  I have decided, and as mentioned somewhere else in here, to redo my cluster as all ESXi.  It won’t take much time.  Some background on this issue can be found at http://communities.vmware.com/message/1335428#1335428.

Update on odd issue with HA not working if the vSphere ESX console was using certain IP addresses

I hope everyone has already heard that the vSphere bug talked about in http://kb.vmware.com/kb/1013013 and something I mentioned, I think in my last newsletter now has a patch. This is the bug that when a very specific IP address scheme is in use on management ports / service console with no other IP schemes in use and a host crashed, the VM’s that should have been started by HA would in fact not be started at all.  I have not tested the fix, as I am wrestling with SRM and trying to get ready for VMworld.  To avoid this bug, only one of the addresses on your service console or management ports need to be using something outside of the ‘special’ scheme.

vDS Implementation Cheat Sheet

I worked with the distributed switches in the past in a lab sense, but recently. For my future SRM testing, I got it going for real in my lab.  And it was hard, confusing, and not intuitive at all.  So I wrote a cheat sheet so you would not have to suffer.  It is attached.  I have used it a few times and am happy with it so hopefully it will make things quicker and easier.  Let me know if you need improvements or changes in it.  http://www.virtualinsanity.com/wp-content/uploads/vDS-Implementation-Cheat-Sheet-b.pdf

Data Recovery Issue – which stops backups from happening

If you ever have an issue with writing to your destination when doing backups, you may see the restore point in red with a (Damaged) beside it.  This can cause your backup to not work again.   The events part of the Reports will show file access errors – 3902.  The solution to this is not in the documentation for vDR but it is here. Expand the display of restore points to be bigger than the default 5.  I used 25 when I had this issue.  Now click all of the restore points that show as damaged.  Then select the Mark for Deletion button in the top right of the screen.  Now change to the Configuration \ Destinations screen and select the destination that is associated with your backup, and use the Integrity Check option near the top right of the screen.  It will take a while.  Once it is complete with no errors – check the Events view of Reports – you need to restart the appliance.  Now your backups should work!

YAPOTAV – Yet another post on why to attend VMworld

Find this at http://blogs.vmware.com/vmtn/2009/08/yapowtav-yet-another-post-on-why-to-attend-vmworld.html.

New vSphere document reference card

Forbes Guthrie has done a wonderful job on a reference card for vSphere documentation stuff.  It pulls stuff out of the documentation and highlights it as a result.  Very handy and well done.  Find it at http://www.vreference.com/public/vsphere4-notes1.0.pdf

View Design Best Practices training

Would you like to learn more about designing a View infrastructure?  The more people you have that depend on it the more important training and experience becomes.  Get some ideas on design at http://mylearn.vmware.com/descriptions/EDU_DATASHEET_ViewDesignBestPractices_V3.pdf

SRM FAQ online now thanks to Duncan at Yellow-Bricks

This is from information I have shared with Duncan but it is great information and I appreciate him sharing with everyone.  Find it at http://www.yellow-bricks.com/srm-faq/.  Duncan’s web site is one of the few you should read frequently. He is a PSO guy in Europe and is very smart, and knows what to communicate – does it real well and I appreciate it.

 

vSphere and VM snapshots and block size

This is something else that Duncan has done.  There is a behavior difference between 3.5x and 4.0 that could catch someone.  Find out more from Duncan at http://www.yellow-bricks.com/2009/08/24/vsphere-vm-snapshots-and-block-size/.

VMware View Cheat Sheet

I have had some help to update my VMware View Cheat Sheet and it has gone very well.  Our next update of this will have a lot more but this is a good document to get you going with View.  www.virtualinsanity.com/wp-content/uploads/VMware-View-Cheat-Sheet-a.pdf

 

Important patch for Celerra when using NFS with VMware

You can find more information about this at Virtual Geek, but it is important to understand that you need to upgrade your Celerra DART OS before you enable NFS datastores with VMware.  Find out more at http://virtualgeek.typepad.com/virtual_geek/2009/08/important-patch-for-celerranfsvmware.html

Lab Manager 4 Upgrade issue

The installer during an upgrade of LM4 assumes all the default roles are present and unmodified.  If the customer removes or changes any the upgrade installer will fail.

FT – Architecture and Performance

Do you know how to determine how many FT enabled VM’s your vSphere server can support?  Do you know how to design your FT environment for the best performance?  In fact, do you know what the performance overhead for FT is?  All of this and more is answered in http://www.vmware.com/resources/techresources/10058.

How can I determine the exact build number for my ESX 4.0.x hosts?

You can find out the way to determine the build numbers for components of ESX 4.0 hosts at http://kb.vmware.com/kb/1012514

VMware Data Recovery Evaluator’s guide

This is a very nice document for someone who needs some guidance for testing VDR.  It is a quite way to get started.  http://www.vmware.com/resources/techresources/10055.  My preso on VDR at VMworld is a combination of install / config / best practices and it will be very useful.  Look for the session, or the preso after VMworld.  It will fit with this eval guide nicely and is known as BC2142.

 

AppSpeed and Maintenance Mode

Currently AppSpeed has no when to listen to the ESX host it is working on, so when the host tries to enter Maintenance mode it will not be able to since the AppSpeed sensor VM will not listen to it and it will not VMotion off the host.  This is a very high priority for us to fix. You will need to manually turn off this sensor before trying to do maintenance mode.
Need some help searching the VMware KB?  Find it at http://xtravirt.com/xd10112 – some interesting info.

NFS Storage Configuration Help

Do you need some help configuring NFS support for your ESX servers.  There is some help at
http://communities.vmware.com/docs/DOC-7900.  This link has only a little info but it does include some troubleshooting info.

VUM and Cisco – conflict message

I got a conflict message from VUM when I tried to patch recently.  It was a conflict with the Cisco Nexus stuff which I do not have installed.  It turns out that I could just ignore it but it was a little bothersome.  We are going to change that message in the near future to be more informative.  That way if you know you don’t have Cisco (or whatever) installed you can just install with no issues.  The issue is we download all the meta data or patches for ESX without any granularity. So the Cisco patches come done too.  More info can be found at http://kb.vmware.com/kb/1013068.

Suggested VMware Employee Sessions at VMworld

This is a list that one of my co-workers put together. It might give you some ideas of what to look for. 

  • Michael White – BC2142 – Data Recovery intro and best practices
  • Tiffany To – DV1790 – View TCO-ROI expert
  • Mahesh Ramachandran – VM1724 – Capacity IQ Tech Preview
  • Chris Rimer – EA2342 – Oracle sessions (especially around questions of Support and Licensing)
  • Richard McDougall – TA3438 – vSphere Performance Guru
  • Jacob Jensen – TA2103 – Virtual Networking guru (especially around the Cisco v1000)
  • Andy Banta – TA3264 – iSCSI Best Practices (THE iSCSI Engineer/Expert at VMware!)
  • Kaushik Banerjee – TA2942 – Performance Best Practices (This guys is a genius in performance and on the Perf. core team!)
  • Paul Manning – VM3566 – Storage Best Practices (Many of you have been on calls with Paul for storage related topics!)
  • Brian CS, Charu Charubal, and Rob Randell – VM2847, TA2544, DV2626, – Security Team extraordinaire
  • Mostafa Khalil – TA2509 – Storage Best Practices (Mostafa is one of the first VCDX members!)
  • Amir Sharif – TA3195, V13226 – ESXi PM – ESXi sessions
  • Monica Sharma – VM2408 – ConfigControl Tech Preview
  • Bill Call – VM2657 – LifeCycle Manager Uber-Guru!
  • Dean Flaming and Travis Sales – DV2478 – ThinApp (These are some of the best sessions I have ever seen historically from these guys!)
  • Gaetan Castelein – EA3605, EA 3606 – Virtualizing Tier 1 applications –
  • Srinivas Krishnamurti – VM2280 – Managing VI from your mobile phone! :)
  • Duncan Epping – TA2259 – Expert VI Design (Duncan runs the #1 Virtualization blog “Yellow-Bricks”)
  • Dean Yao – BC3369 – FT Real World design
  • Howie Xu – TA3521 – vNetwork Troubleshooting (Howie invented the vSwitch! – and wrote one of our TCP/IP stacks)
  • Banjot Chanana – BC3425 – High Availability Futures
  • Nicholas Jacques – PA4694 – AppSpeed PM
  • Eric Horschmann – TA3880 – vSphere vs Hyper-V/XenServer
  • Warren Ponder – DV2697 – View /VDI PM
  • Mike DiPetrillo – TA3326 – Cloud (Mike is another uber-rock star and talks all things Cloud!)
  • Rahul Ravulur- -VM4380 – vCenter PM covering future of vCenter
  • Naeem Malik – VM3609 – Capacity Planner expert
  • Aaron Sweemer – DV3567 – How to convert old PCs to Thin Clients using a thin Linux OS and VMware View Open Client.

**** Reminders ******

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Connections and Ports in ESX & ESXi

I got an email from Dudley Smith (a VMware TAM and the author of Troubleshooting ESX and Connections & Ports in VI3.5) informing me that he had recently updated one of his documents.  Wow, he sure did.  Check this puppy out (click the graphic to download) … 

 

image

Pretty slick, eh?  Well it gets even better.  He also created a version using The Brain in HTML … http://www.virtualinsanity.com/esx-connections-and-ports/.  Nice!  This is definitely a bookmark I’ll be keeping handy and I’d recommend you do the same.

Good work Dudley!  Thanks for making it available for everyone.  If you agree, be sure to leave a “Thank You” comment for Dudley Smith.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Scripted ESX Installation: Reconfiguring COS Networking with Kickstart

Frequently customers have specific NICs (like onboard NICs) that they’d like assigned to the COS, leaving the other NICs for VM traffic.  This is difficult, however, when using our automated kickstart deployment scripts as there is no way to explicitly define the vmnic assigned to the COS.  And to make matters worse, the VMkernel is not yet available to us during the %post section of the kickstart script, which makes COS networking configuration difficult! Recently I had a customer who was getting frustrated because …

  1. They would “rack and stack” a physical server and wire up their NICs accordingly (i.e. onboard NICs on the management VLAN, remaining NICs on production VLANs)
  2. PXE boot the server
  3. When kickstart completed, they’d lose connection to the COS.

This happens because during installation, ESX just assigns vmnic0 to the lowest PCI number, and then assigns vmnic0 to the COS. And this is often not the NIC the admin wants used for their COS. Of course, they could go back after the fact and reconfigure the COS networking, but this kind of defeats the purpose of a completely hands-free, automated deployment.

Here is one possible solution to the problem.  Below is a script I wrote to append to the %post section of a kickstart file.  Obviously, you’ll need to make modifications for your environment.

## This script should be appended to the %post section of an ESX kickstart file.
## For more info on kickstart and scripted ESX installations, see Appendix B of
## http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_installation_guide.pdf

##
##
Essentially, this is a “script that creates a script.” Because the VMkernel is
## not yet available to us during the %post section of the scripted install, we use
## %post to generate a script called /tmp/post_esx_install.sh that will launch via
## rc.local upon first boot (and only first boot).
##
## The post_esx_install.sh will first make a backup copy of esx.conf and then
## reconfigure the COS networking.  Please see the in-line comments below for
## tweaking post_esx_install.sh for your environment.
##
## If you have any questions, please email aaron [at] sweemer [dot] com.

%post

cat > /tmp/esx_post_install.sh << EOF
#!/bin/bash
cp /etc/vmware/esx.conf /etc/vmware/esx.conf.backup
/usr/sbin/esxcfg-vswitch -U vmnic0 vSwitch0
/usr/sbin/esxcfg-vswif -d vswif0

## If your kickstart file has vmportgroup=1, you *might* want to uncomment the
## next line

## /usr/sbin/esxcfg-vswitch -D “VM Network”

/usr/sbin/esxcfg-vswitch -A “VMkernel” vSwitch0

## You’ll need to find which physical NICs you want assigned to your COS.  From
## the command line of an already installed ESX server, execute
## “/usr/sbin/esxcfg-nics -l” as root and look for something unique about the
## NICs.  For example, this could be the word “Broadcom” or it could be the
## actual PCI number.  In the next line, replace “search term” with this
## text.

/usr/sbin/esxcfg-nics -l | awk ‘\$0 ~ /search term/ {print \$1}’ | xargs –n 1 /usr/sbin/esxcfg-vswitch vSwitch0 –L

## Note: if you want to test the line above from the command-line, you’ll need
## to remove the leading “\” in front of $0 and $1. The \’s need to be here so
## the esx_post_install.sh script gets properly written by kickstart. But when
## executing directly on a command line, the \’s need to be removed.

## Replace the x.x.x.x after -i with the IP address and after -n with the
## subnet mask for your COS.

/usr/sbin/esxcfg-vswif -a vswif0 -p “Service Console” -i x.x.x.x  -n x.x.x.x

## Replace the x.x.x.x after -i with the IP address and after -n with the subnet
## mask for your VMkernel port group.

/usr/sbin/esxcfg-vmknic -a -i x.x.x.x -n x.x.x.x VMkernel

## Replace x.x.x.x with the default gateway for the COS in both of the next two lines.
route add default gw x.x.x.x
echo “GATEWAY=x.x.x.x” >> /etc/sysconfig/network

mv /etc/rc.d/rc.local.save /etc/rc.d/rc.local
EOF

chmod +x /tmp/esx_post_install.sh
cp /etc/rc.d/rc.local /etc/rc.d/rc.local.save

cat >> /etc/rc.d/rc.local << EOF
cd /tmp/
/tmp/esx_post_install.sh
EOF

As an example, in my environment I have server with 4 NICs and by default, ESX assigns vmnic0, which is mapped to PCI 02:00.00, to the service console. However, what is actually physically wired to my management network is vmnic3, which is mapped to PCI 02:03.00.  In the script above, I simply searched for the number 3 (i.e. replaced search term with 3) and now my scripted ESX installation works properly.

Below is the configuration of my server before I redeployed with kickstart.  The line in red is the NIC I want assigned to the COS.  The lines in black are what ESX assigns the COS by default.

BEFORE (without %post section)


[root@vesx7 root]# esxcfg-nics -l
Name    PCI      Driver      Link Speed    Duplex MTU    Description
vmnic1  02:01.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM

vmnic2  02:02.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic3  02:03.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic0  02:00.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM


[root@vesx7 root]# esxcfg-vswitch -l
Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0       64          4           64                1500    vmnic0

PortGroup Name      VLAN ID  Used Ports  Uplinks
VM Network          0        0           vmnic0

Service Console     0        1           vmnic0

Now, here is the same output after I redeployed the server with my modifications to the %post section of the kickstart file. The scripted deployment of ESX now properly assigns vmnic3 to my service console.

AFTER (with %post section)

[root@vesx7 root]# esxcfg-nics -l
Name    PCI      Driver      Link Speed    Duplex MTU    Description
vmnic1  02:01.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic2  02:02.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic0  02:00.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM
vmnic3  02:03.00 e1000       Up   1000Mbps Full   1500   Intel Corporation 82545EM

[root@vesx7 root]# esxcfg-vswitch -l
Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0       64          5           64                1500    vmnic3

PortGroup Name      VLAN ID  Used Ports  Uplinks
Production          0        0           vmnic3

Service Console     0        1           vmnic3

I hope this was helpful.  Let me know if you have any questions.

Well, I’d better sign off and start packing because I leave for Omaha, NE in a few hours.



Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

VCDX Admin Exam Notes — Section 1.1

I finally got a chance to sit down and reformat some of my notes for the VCDX Admin Exam.  Below are my notes for Section 1.1 of the VMware Enterprise Administration Exam Blueprint v3.5.  Everything in Blue is a direct cut and past from the exam blueprint.

Oh, and thanks to the Disqus comment from VirtualizationTeam (Blog), letting me know that Peter van den Bosch has a more recent version of his VMware Enterprise Administration Exam Study Guide 3.5

 

Section 1 – Storage

Objective 1.1 – Create and Administer VMFS datastores using advanced techniques.

Knowledge

Describe how to identify iSCSI, Fibre channel, SATA and NFS configurations using CLI commands and log entries

Here are a few command line examples that I believe would work well …

1)  esxcfg-mpath –l
This command produces the following output on my server:

 

[root@cincylab-esx3 root]# esxcfg-mpath -l

Disk vmhba0:0:0 /dev/sdb (152627MB) has 1 paths and policy of Fixed

Local 0:31.2 vmhba0:0:0 On active preferred

Disk vmhba32:0:0 /dev/sda (152627MB) has 1 paths and policy of Fixed

Local 0:31.2 vmhba32:0:0 On active preferred

Disk vmhba35:0:0 /dev/sdc (923172MB) has 1 paths and policy of Fixed

iScsi sw iqn.1998-01.com.vmware:cincylab-esx3-1d029e5f<->iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1 vmhba35:0:0 On active preferred


2)  esxcfg-info –s

The –s flag will narrow the scope of the output to just storage and disk related info.  But even with the narrowed scope, this command produces way too much output to be displayed here.  You’ll likely want to pipe the output into grep, or at a minimum to a more/less to get what you’re looking for.


3)  cat /var/log/vmkernel | grep vmhba | tail –10

This will search the vmkernel log file and display the last 10 lines containing the text vmhba.  If you want more (or fewer lines) change the –10 to whatever suits your needs.

If found this one particularly useful when you’ve enabled the software iSCSI initiator at the command line, but don’t know yet number has been assigned to the vmhba (e.g. vmhba35). 

4)  esxcfg-vmhbadevs –m  and  ls –lah /vmfs/volumes

The command esxcfg-vmhbadevs –m will show the mapping between vmhba numbers, device files and their UUIDs.  If you’d like a quick and easy way to see what UUIDs are mapped to their human readable name, you can follow that up with a ls –lah /vmfs/volumes.  The two commands back to back produce the following output on my server:

[root@cincylab-esx3 root]# esxcfg-vmhbadevs -m
vmhba35:0:0:1   /dev/sdc1                        4986310d-6525e5e6-ebbd-00237d0681e7
vmhba0:0:0:3    /dev/sdb3                        49e115fb-3e22358c-c10a-00237d0681e7
vmhba32:0:0:1   /dev/sda1                        4985c53e-e7b1904f-5042-00237d0681e7

 

[root@cincylab-esx3 root]# ls -lah /vmfs/volumes/
total 10M
drwxr-xr-x    1 root     root          512 Apr 20 23:07 .
drwxrwxrwt    1 root     root          512 Apr 11 18:12 ..
drwxr-xr-t    1 root     root         1.2K Feb  1 21:34 4985c53e-e7b1904f-5042-00237d0681e7
drwxr-xr-t    1 root     root         3.7K Apr 14 14:49 4986310d-6525e5e6-ebbd-00237d0681e7
drwxr-xr-t    1 root     root          980 Apr 11 18:13 49e115fb-3e22358c-c10a-00237d0681e7
lrwxr-xr-x    1 root     root           35 Apr 20 23:07 cincylab-esx3:storage1 -> 4985c53e-e7b1904f-5042-00237d0681e7
lrwxr-xr-x    1 root     root           35 Apr 20 23:07 cincylab-esx3:storage2 -> 49e115fb-3e22358c-c10a-00237d0681e7
lrwxr-xr-x    1 root     root           35 Apr 20 23:07 vol1 -> 4986310d-6525e5e6-ebbd-00237d0681e7

5)  vmkiscsi-ls

This one only applies to iSCSI storage, of course, and produces the following output on my server:

[root@cincylab-esx3 root]# vmkiscsi-ls

*************************************************************
        SFNet iSCSI Driver Version … 3.6.3 (27-Jun-2005 )
*************************************************************
TARGET NAME             : iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1
TARGET ALIAS            :
HOST NO                 : 4
BUS NO                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 10.10.8.200:3260
SESSION STATUS          : ESTABLISHED AT Sun Apr 12 11:35:09 2009
NO. OF PORTALS          : 1
PORTAL ADDRESS 1        : 10.10.8.200:3260,1
SESSION ID              : ISID 00023d000001 TSIH 1400
*************************************************************


Describe the VMFS file system

There are many subsections here and before digging into each one, check out the following three links …

Metadata 
The simple definition of Metadata is “data about data.”  All file systems handle metadata differently.  VMFS uses metadata, stored in a special area of each volume, to manage all the files, directories (in VMFS-3 only), and attributes about the volume.  VMFS is a clustered file system, meaning more than one ESX server can access the same file system at the same time.  Therefore an update to the metadata requires locking of the LUN using a SCSI reservation.

Multi-access and locking
The following was taking from Advanced VMFS Configuration and Troubleshooting.

  Distributed Lock handling by VMFS3

  • Done in-band
  • Hosts mount a VMFS3 volume
  • Hosts’ ids posted to heartbeat region
  • Heartbeat records are updated at regular intervals by hosts
  • Host X locks a file, the lock is associated with its ID
  • If host X dies or loses access to volume the file lock is stale
  • Host Z attempts to lock the same file which is locked
  • Host Z check the heartbeat record of Host X (~5 times)
  • If host X heartbeat record is not updated, Host Z will age the lock
  • All other hosts yield to host Z and not attempt to lock the file
  • Lock is broken and Host Z acquires the lock
  • Journal is replayed by Host Z

 Extents
Extents are logical extensions of a file system.  They are typically used to grow a volume beyond the VMFS size limitations.  Essentially, an extent is the “joining” of two or more volumes into a single, logical VMFS volume.

Tree structure and files
The vmfs partition is mounted to the directory with the corresponding UUID found in /vmfs/volumes.  The human readable name of the volume is merely a symbolic link to that directory.  By default, all VMs are given a directory at the root of the partition.  So, for example, a VM with the name of AaronSweemer would have the directory /vmfs/volume/UUID/AaronSweemer.  In this directory you will find all files specific and relevant to that VM.  This is the default behavior as some (not all) of these files can be configured to reside elsewhere. 

Here is a table of common files found on the VMFS file system. 

Extension Usage
.dsk VM disk file
.vmdk VM disk file
.hlog VMotion log file
.vswp Virtual swap file
.vmss VM suspend file
.vmtd VM template disk file
.vmtx VM Template configuration file
.REDO Files used when VM is in REDO mode
.vmx VM configuration file
.log VM log file
.nvram Nonvolatile RAM

Journaling
From Wikipedia …

A journaling file system is a file system that logs changes to a journal (usually a circular log in a dedicated area) before committing them to the main file system. Such file systems are less likely to become corrupted in the event of power failure or system crash.

Explain the process used to align VMFS partitions 

The following procedure was found in VMware Enterprise Administration Exam study guide 3.5 (page 5) and Advanced VMFS Configuration and Troubleshooting (slide 36).

Aligned partitions start at 128. If the Start value is 63 (the default), the partition is
not aligned. If you choose not to use the VI Client and create partitions with
vmkfstools, or if you want to align the default installation partition before use, take
the following steps to use fdisk to align a partition manually from the ESX Server
service console:
1. Enter fdisk /dev/sd<x> where <x> is the device suffix.
2. Determine if any VMware VMFS partitions already exist. VMware VMFS
partitions are identified by a partition system ID of fb. Type d to delete to
delete these partitions.
Note: This destroys all data currently residing on the VMware VMFS partitions you
delete.
3. Ensure you back up this data first if you need it.
4. Type n to create a new partition.
5. Type p to create a primary partition.
6. Type 1 to create partition No. 1.
Select the defaults to use the complete disk.
7. Type t to set the partition’s system ID.
8. Type fb to set the partition system ID to fb (VMware VMFS volume).
9. Type x to go into expert mode.
10. Type b to adjust the starting block number.
11. Type 1 to choose partition 1.
12. Type 128 to set it to 128 (the array’s stripe element size).
13. Type w to write label and partition information to disk.

Explain the use cases for round-robin load balancing

Multipathing is typically used for failover.  Meaning, if one storage path becomes available the host can failover to an alternate path.  However, multipathing can also be used in a round-robin fashion to achieve load balancing to achieve better utilization of the HBAs.  There are a couple different configurable options that specify when an ESX server switches paths.  From the Round-Robin Load Balancing technical note …

When to switch – Specify that the ESX Server host should attempt a path switch after a specified number of I/O blocks have been issued on a path or after a specified number of read or write commands have been issued on a path. If another path exists that meets the specified path policy for the target, the active path to the target is switched to the new path. The –custom-max-commands and –custom-max-blocks options specify when to switch.

Which target to use – Specify that the next path should be on the preferred target, the most recently used target, or any target. The –custom-target-policy option specifies which target to use.

Which HBA to use – Specify that the next path should be on the preferred HBA, the most recently used HBA, the HBA with the minimum outstanding I/O requests, or any HBA. The –custom-HBA-policy option specifies which HBA to use.

Skills and Abilities

Perform advanced multi-pathing configuration

  • Configure multi-pathing policy
  • Configure round-robin behavior using command-line tools
  • Manage active and inactive paths

Setting the Path Switching Policy
You can set the path?switching policy for failover and for load balancing by using the esxcfg-mpath command.

You can set the path switching policy on a per?LUN basis by using the esxcfg-mpath command’s –policy custom option. If you specify –policy custom, you must also specify one of the custom policy options. Because the path switching policy is set on a per?LUN basis, you must always specify the LUN using the –lun option.

Notes

If you set the custom-max-blocks and custom-max-commands, options, the system attempts to switch paths as soon as one of the limits is reached.

If you set the target or the HBA policy to preferred, the system chooses the target or the HBA of the preferred path when possible. If a preferred policy is set on an active/passive SAN array, and the preferred target is not on the active SP (Storage Processor), the system does not select the preferred target but a target on the active SP.

Path switching is not performed if an outstanding SCSI reservation is on the target, or if a path failover is underway. Path switching is delayed until an I/O request is performed when no reservations or path failovers are pending.

 

 Configure and use NPIV HBAs

<<I don’t have NPIV in my lab.  Need to revisit this section>>

 
Manage VMFS file systems using command-line tools

The command line tool you’ll use for managing VMFS file systems in vmkfstools.  It’s a very powerful tool and there are many options available, so I suggest you read the man page.  The following examples (taken from the online documentation) are certainly not inclusive, just a quick sample of what the tool can do. 

Example for Creating a VMFS File System
vmkfstools -C vmfs3 -b 1m -S my_vmfs /vmfs/devices/disks/vmhba1:3:0:1

Example for Extending a VMFS-3 Volume
vmkfstools -Z /vmfs/devices/disks/vmhba0:1:2:1 /vmfs/devices/disks/vmhba1:3:0:1

Upgrading a VMFS-2 to VMFS-3
-T –tovmfs3 -x –upgradetype [zeroedthick|eagerzeroedthick|thin]

Example for Creating a Virtual Disk
vmkfstools -c 2048m /vmfs/volumes/myVMFS/rh6.2.vmdk

Example for Cloning a Virtual Disk
vmkfstools -i /vmfs/volumes/templates/gold-master.vmdk /vmfs/volumes/myVMFS/myOS.vmdk

 

 Configure NFS datastores using command-line tools

Assuming your NAS is configured properly, this is pretty easy.  The following command will mount an NFS datastore on an ESX host …

esxcfg-nas –a –o 10.10.8.25 –s /nfs/share NAS

In this example, the –a adds a host with the IP address followed by the –o flag using the share configured after the –s flag.  Upon successfully adding the datastore, the NFS mount will be found at /vmfs/volumes/NAS

The following command will remove the datastore

esxcfg-nas –d –o 10.10.8.25 NAS

 
Configure iSCSI hardware and software initiators using command-line tools

I don’t know if I’ve seen an official, formal example of how to do this (though I’m sure it exists somewhere).  So, here’s how I do it …

Step 1:  Add the portgroup to vSwitch0 
esxcfg-vswitch –add-pg=VMkernel vSwitch0

Step 2:  Add the IP to the VMkernel portgroup
esxcfg-vmknic -a -i 10.10.8.202 -n 255.255.255.0 VMkernel

Step 4:  Enable iSCSI
esxcfg-swiscsi –e

Step 5:  Add the target
vmkiscsi-tool -D -a 10.10.8.200 vmhba34

Step 6:  Rescan the HBA
esxcfg-rescan vmhba34

 

That’s it for section 1.1 … time to go reformat my notes for section 1.2!

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Troubleshooting ESX

I was at the Louisville VMUG on Friday talking about Troubleshooting ESX.  In my preparation for the event, I was looking for a good PowerPoint presentation I could reuse and I stumbled across a sweet little gem of a document.  Dudley Smith, a VMware Technical Account Manager (TAM) out of Virginia, created a cool one page Mind Map for Troubleshooting ESX.  Does it address every potential issue you’ll come across?  No, of course not.  But it’s a heck of a good place to start.  One look at his Mind Map and I thought to myself, “that would be a great thing to have printed out and hanging over every VMware admin’s desk.”

Well, long story short, I snagged it and threw it up on the big screen behind me as I was presenting.  During the presentation (and many times since the presentation) I had many requests to post the PDF for download. 

But since I couldn’t just start passing out someone else’s work as my own, I sent Dudley a quick email asking for permission to distribute.  He responded by saying, “Sure, publish away!  You might enjoy this too… ”  Attached was another one page document that visually shows the TCP/UDP ports leveraged in VI3.5.  Nice!  Again, another great document to have printed out and hanging over your desk, IMHO.

So, courtesy of the author, Dudley Smith, here are two documents that I would recommend you add to your tool belt.  (click the images to download the PDFs)

 

mind_map_vi35

 

 

connections_and_ports

 

If you like them, leave a comment for Dudley.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

Bring on the 10Gig Ethernet!

VMware recently updated its networking performance tests to see if the ESX hypervisor could efficiently leverage the ever-expanding bandwidth available at the Ethernet level. In short, it sure can! A single VM can effectively saturate a 10Gbps link when jumbo frames are enabled. But that’s not to say it can’t perform well with multiple virtual machines. Things scaled nicely and equitably for all VM’s. This type of scalable performance is reassuring as customers continue to raise consolidation ratios within their datacenters and virtualize the largest of workloads.

To save you some reading, here is the summary from the whitepaper, which can be found at: http://www.vmware.com/pdf/10GigE_performance.pdf

Conclusion:The results presented in the previous sections show that virtual machines running on ESX 3.5 Update 1 can efficiently share and saturate 10Gbps Ethernet links. A single uniprocessor virtual machine can push as much as 8Gbps of traffic with frames that use the standard MTU size and can saturate a 10Gbps link when using jumbo frames. Jumbo frames can also boost receive throughput by up to 40 percent, allowing a single virtual machine to receive traffic at rates up to 5.7Gbps.

Our detailed scaling tests show that ESX scales very well with increasing load on the system and fairly allocates bandwidth to all the booted virtual machines. Two virtual machines can easily saturate a 10Gbps link (the practical limit is 9.3Gbps for packets that use the standard MTU size because of protocol overheads), and the throughput remains constant as we add more virtual machines. Scaling on the receive path is similar, with throughput increasing linearly until we achieve line rate and then gracefully decreasing as system load and resource contention increase.

Thus, ESX 3.5 Update 1 supports the latest generation of 10Gbps NICs with minimal overheads and allows high virtual machine consolidation ratios while being fair to all virtual machines sharing the NICs and maintaining 10Gbps line rates.

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon