VMFS
Get Thin Provisioning working for you in vSphere
Oct 12th
Going Thin and not looking back.
Yes, I am slowly losing my hair like many other aging men out there, but it wouldn’t be virtual insanity if I were blogging about my personal male pattern baldness issues. With the latest release of VMware vSphere comes a lot of new features and functionality that can be leveraged to make our lives easier. One of these features, that I personally have been looking forward to for a while, is Thin Provisioning. If you aren’t familiar with this technology, jump over to Gestalt IT for a great explanation of what it is and how it works.
One of the exciting promises of thin provisioning, is getting more “bang for your buck” out of the expensive enterprise storage you have been investing in for your ESX environment. But, as Bret Michael’s once said, “Every rose has its thorn” and there are some things to look out for and considerations to make, before implementing thin disk technologies.
Efficiencies are great if they work right and don’t over
complicate the environment.
Do your homework and make sure you understand the characteristics of the virtual machine that you are considering migrating into a thin disk configuration. The last thing you want to do is convert every VM to thin disk, and four months down the road all of your data stores are filling up and you’re scrambling for a storage CAPEX. Some people are of the opinion to do thin provisioning either on the host side (VMware) or on the storage array side, but not both. Take a gander at Chad Sakac’s blog that discusses thin on thin and some thoughts around each of these approaches. I’m not going to go into all of the pluses and minuses of thin provisioning but rather focus on how to make it work for you.
Coffee Talk
So now that we have some of the basics out of the way, I wanted to share my thoughts on thin provisioning. Like many organizations, we get requests from our customers that err on the side of caution. They want to plan for the worse case and ensure that their project and/or application isn’t setup for failure. I don’t blame them really, I do it myself all the time when I make coffee at home. I always end up making more coffee than I typically drink, just in case I might need that extra charge. The best way to do that is pad it, request more than what you might really need, just in case something comes up down the road. Virtual machine disk storage in some cases fits this same profile. If my coffee maker granted me access to hot coffee on demand, I would stop making extra coffee. Thin disks can give your end users that capacity on demand so you can gain control of the padding effect that typically takes place in most corporate organizations.
Take it back…
So now you have done your research, you’re starting to get a feel for what this thin stuff is and how it might play out in your shop. It’s go time. If you’re a smaller VMware customer, you probably already have an idea of what are good target disks to convert. If you’re a larger environment, it might be a little more difficult to gauge where the bloated pigs are hiding.
I worked at GE for a couple of years and was exposed to some of the Six Sigma methodologies they preach as well as practice. Sounds boring, right? Not really. You can really leverage DMAIC for a lot of IT related problems/issues/projects. You don’t have to take it to the extreme, use the framework to help guide you on your quest:
DMAIC
The DMAIC project methodology has five phases:
- Define high-level project goals and the current process.
- Measure key aspects of the current process and collect relevant data.
- Analyze the data to verify cause-and-effect relationships. Determine what the relationships are and attempt to ensure that all factors have been considered.
- Improve or optimize the process based upon data analysis using techniques like Design of experiments.
- Control to ensure that any deviations from target are corrected before they result in defects. Set up pilot runs to establish process capability, move on to production, set up control mechanisms and continuously monitor the process.
We have already defined our project goals and what we are trying to accomplish. We need a good “Measure” tool to really find where we might benefit from thin provisioning. Powershell is a great tool that most VMware administrators use, or have at least heard of. So this was the first place I turned to for assistance.
Alan Renouf of “Virtu-AL” http://www.virtu-al.net/ gave me a hand in writing the powershell script needed. (Thanks again, Alan!). Alan already had a one liner script to produce a list of vm’s, their disks assigned, and how much data each disk was consuming. I needed the ability to see this data outside a powershell window and be able to analyze it in a better format. We have a decent-sized VMware environment and exporting this out to a .csv for analysis is extremely helpful. Here is the script!
************************************************************************
# Set the Filename for the exported data
$Filename = “C:\VMDisks.csv”
Connect-VIServer MYVIServer
$AllVMs = Get-View -ViewType VirtualMachine
$SortedVMs = $AllVMs | Select *, @{N=”NumDisks”;E={@($_.Guest.Disk.Length)}} | Sort NumDisks -Descending
$VMDisks = @()
ForEach ($VM in $SortedVMs){
$Details = New-object PSObject
$Details | Add-Member -Name Name -Value $VM.name -Membertype NoteProperty
$DiskNum = 0
Foreach ($disk in $VM.Guest.Disk){
$Details | Add-Member -Name “Disk$($DiskNum)path” -MemberType NoteProperty -Value $Disk.DiskPath
$Details | Add-Member -Name “Disk$($DiskNum)Capacity(MB)” -MemberType NoteProperty -Value ([math]::Round($disk.Capacity/ 1MB))
$Details | Add-Member -Name “Disk$($DiskNum)FreeSpace(MB)” -MemberType NoteProperty -Value ([math]::Round($disk.FreeSpace / 1MB))
$DiskNum++
}
$VMDisks += $Details
Remove-Variable Details
}
$VMDisks | Export-Csv -NoTypeInformation $Filename
***********************************************************************
So now that you have this great spreadsheet, you can do all sorts of crazy sorting and reporting, within Excel. Take some time on phase 3, “Analyze” what you’re seeing. Talk to your VM stakeholders to see how things might be changing from their perspective. Try to plan for the surprises and position yourself accordingly.
Next is the “Improve” phase of DMAIC (see it’s easy!). This is the part where you actually do the work. It’s time to start leveraging the storage VMotion API’s, and reclaim some of that unused disk.
- Select the target VM in the VC client.
- Right click on the VM and select the option “Migrate”.
- Select the option “Change Datastore”.
- Select the destination, or click advanced if you are targeting one particular disk.
- Select “Thin provisioned format”.
- Select Finish.
Rinse and Repeat for the rest of that spreadsheet you have worked so hard on.
The last phase of DMAIC is “Control”. This is one of the most important pieces to thin provisioning in my opinion. At the minimum you need to setup Virtual Center alerts to monitor when your datastores are approaching critical levels. You can’t implement thin disks in your vSphere environment and walk away. The smart people over at VMware have given us the ability to monitor datastore disk space usage and over-allocation with the latest release of Virtual Center. Setup your monitors so you are e-mailed when some of these thin disks begin to grow and you need to take some action.
Eric Gray of VMware takes this to the next level, check out his blog post on utilizing powershell to prevent datastore emergencies. My personal approach to this concept is to setup a “hotspare” datastore for your environment. A good practice to implement here would be to try reclaiming enough storage from your migrations to thin disks to free-up a “hot spare datastore”. Implementing an automated recovery solution like Eric’s will help you sleep easier at night. Worried about what might happen if your script doesn’t work or you do hit the perfect storm and end up with a full VMFS volume? Intelligence has been built into vSphere to automatically pause the virtual machines, impressive. Check out Eric’s video:
Wrapping it all up
Thin disk provisioning is a great feature that you should consider leveraging in your environment. With some forward thinking and best practices you can achieve higher ROI for your ESX storage. VMware vSphere offers the ability for you to migrate from thick to think with no downtime, so you can begin reclaiming storage on the fly. Keep it simple, start out with a high level analysis of your infrastructure. Identify the candidates that are a good fit and worth focusing on. Setup your alerts on the datastores as soon as you migrate your first virtual machine so you are protecting yourself from problems down the road. Consider taking automated actions if your datastores are reaching critical thresholds.
I hope you found this article helpful, good luck!
Scott Sauer
VCDX Admin Exam Notes – Section 1.3
May 18th
Ugh. My brain hurts. I’ve spent the past few hours reviewing scripted ESX installations and working on a PowerShell script for a customer that will reorder vmnics after a scripted installation is complete (because I can’t find any other way to force their order during the install). It’s been a few months since I’ve done a scripted installation, so I definitely needed a refresher. Plus, according to the VMware Enterprise Administration Exam Blueprint v3.5, section 8.1 is all about automating ESX deployments. The good news is that section 8.1 is the last section of the blueprint, so I believe I’m almost done preparing for the VCDX Admin Exam, which I’m scheduled to take in a few days.
Anyway, going back to the beginning of the Blueprint, and continuing from where I left off, here is the next section of my study notes.
Objective 1.3 – Troubleshoot Virtual Infrastructure storage components.
Knowledge
Identify storage related events and log entries. Analyze storage events to determine related issues.
All storage related events will be recorded in the /var/log/vmkernel log file. Most of the messages in this log file are fairly cryptic and can be difficult to interpret. Furthermore, this log file contains all messages from the vmkernel, not just storage related messages, so you’ll have to filter through it. An easy way to do this is simply to search for SCSI. For example, the command cat /var/log/vmkernel | grep SCSI on one of my servers produces the following output (only showing the last 10 lines) …
[root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -10
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1069)iSCSI: login phase for session 0×8603f90 (rx 1071, tx 1070) timed out at 23051576, timeout was set for 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0×8603f90 connect timed out at 23051576
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)<5>iSCSI: session 0×8603f90 iSCSI: session 0×8603f90 retrying all the portals again, since the portal list got exhausted
May 18 12:57:47 cincylab-esx3 vmkernel: 2:16:01:55.748 cpu2:1071)iSCSI: session 0×8603f90 to iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1 waiting 1 seconds before next login attempt
May 18 12:57:48 cincylab-esx3 vmkernel: 2:16:01:56.748 cpu2:1071)iSCSI: bus 0 target 0 trying to establish session 0×8603f90 to portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:00 cincylab-esx3 vmkernel: 2:16:02:09.355 cpu2:1071)iSCSI: bus 0 target 0 established session 0×8603f90 #3, portal 0, address 10.10.8.200 port 3260 group 1
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba35:C0:T0:L0 : 0×0 0×80 0×83
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba35:C0:T0:L0: 0×1 0×1 0×0 0×18 0×42 0×55 0×46 0×46 0×41 0×4c 0×4f 0×0 0×0 0×0 0×0 0×0 0×1 0×0 0×0 0×0 0×0 0×0 0×0 0×0 0×2 0×0 0×0 0×0
May 18 12:58:01 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba35:C0:T0:L0 0×20 0×20 0×20 0×20 0×56 0×49 0×52 0×54 0×55 0×41
[root@cincylab-esx3 root]#
If you look closely, I clearly had some issues with my iSCSI appliance a few hours ago. I decided make some configuration changes to the switch and then, all of a sudden, the ESX server lost connectivity to its storage. Weird!
Anyway, what does all this mean? There’s an really good VMworld Europe 2008 presentation (which you can get from www.vmworld.com) titled VI3 Advanced Log Analysis, which goes into detail about how to interpret VMware log files. From that presentation, I found this diagram which describes the components of a message in the vmkernel log file.
Skills and Abilities
Verify storage configuration and troubleshoot storage connection issues using CLI , VI Client and logs
- Rescan events
A rescan event can be initiated with the esxcfg-rescan at the command line. The output should look like the following …
[root@cincylab-esx3 root]# esxcfg-rescan vmhba32
Rescanning vmhba32 …
On scsi1, removing: 0:0.
On scsi1, adding: 0:0.
Done.
[root@cincylab-esx3 root]# cat /var/log/vmkernel | grep SCSI | tail -3
May 18 19:13:09 cincylab-esx3 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T0:L0 : 0×0 0×80 0×83
May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T0:L0: 0×2 0×0 0×0 0×18 0×4c 0×69 0×6e 0×75 0×78 0×20 0×41 0×54 0×41 0×2d 0×53 0×43 0×53 0×49 0×20 0×73 0×69 0×6d 0×75 0x
May 18 19:13:10 cincylab-esx3 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T0:L0 0×36 0×52 0×58 0×36 0×4a 0×39 0×39 0×58 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×20 0×47 0×42 0×30 0×31 0×36 0x
[root@cincylab-esx3 root]#
- Failover events
I don’t have redundant paths in my lab to simulate this. So, again from the VMworld presentation, VI3 Advanced Log Analysis, here is a screen shot from the slide that covers this topic.
There will obviously be a lot of different types of error and event messages in /var/log/vmkernel. And I’m certainly not going to try and list every possible combination here. I highly suggest you download the VMworld preso because it does a great job of explaining how to further decipher the log files (like defining SCSI error codes).
Well, that’s about it for this section. Back to PowerShell scripting for another hour or so.
VCDX Admin Exam Notes – Section 1.2
May 11th
Last week I was in San Francisco with most (maybe all) of the VMware field technical folks for a three day technical summit. One of the evenings we had an awards ceremony and dinner. And guess what? The first eight VCDX certifications ever to be awarded were announced.
Now, VMware is a pretty big company, so I didn’t recognize seven of the eight names. But I definitely recognized one of them. Well, that is, I should say I recognized his name. Having never officially met him fact to face, I couldn’t pick him out of a crowd of two. You might know him as the rock star blogger from Yellow Bricks. Congratulations Duncan Epping! I believe he said he is VCDX number seven and the first VCDX in Europe. Very cool. And well deserved, for sure.
I’m a little behind Duncan. I’m scheduled to take the Admin Exam later this month, which is the first of two exams. Then I’ll have to present and defend a successful design and deployment before a jury … er, I mean, panel of my peers.
Anyway, here are my notes for section 1.2 of the VMware Enterprise Administration Exam Blueprint v3.5. (Section 1.1 can be found here). Everything in Blue is a direct cut and paste from the exam blueprint.
Objective 1.2 – Implement and manage complex data security and replication
configurations.
Knowledge
Describe methods to secure access to virtual disks and related storage devices
- Distributed Lock Handling

In the graphic below, notice how each ESX server sees and has access to the same LUN? This is achieved via VMFS, a clustered file system which leverages distributed file locking to allow multiple hosts to access the same storage. When a Virtual Machine is powered on, VMFS places a lock on its files, ensuring no other ESX server can access them.
Identify tools and steps necessary to manage replicated VMFS volumes
- Resignaturing
First, there’s a really good article on VMFS resignaturing by Duncan (go figure). Also, Chad Sakac over at Virtual Geek has a great article too. I’m not going to reinvent the wheel, so make sure you read their posts. You’ll need to understand this. For the exam, you’ll certainly need to know the following …The following is from the Fibre Channel SAN Configuration Guide:
EnableResignature=0, DisallowSnapshotLUN=1 (default)
In this state:
- You cannot bring snapshots or replicas of VMFS volumes by the array into the ESX Server host regardless of whether or not the ESX Server has access to the original LUN.
- LUNs formatted with VMFS must have the same ID for each ESX Server host.
EnableResignature=1, (DisallowSnapshotLUN is not relevant)
In this state, you can safely bring snapshots or replicas of VMFS volumes into the same servers as the original and they are automatically resignatured.EnableResignature=0, DisallowSnapshotLUN=0
This is similar to ESX 2.x behavior. In this state, the ESX Server assumes that it sees only one replica or snapshot of a given LUN and never tries to resignature. This is ideal in a DR scenario where you are bringing a replica of a LUN to a new cluster of ESX Servers, possibly on another site that does not have access to the source LUN. In such a case, the ESX Server uses the replica as if it is the original.Do not use this setting if you are bringing snapshots or replicas of a LUN into a server
with access to the original LUN. This can have destructive results including:
- If you create snapshots of a VMFS volume one or more times and dynamically
bring one or more of those snapshots into an ESX Server, only the first copy is
usable. The usable copy is most likely the primary copy. After reboot, it is
impossible to determine which volume (the source or one of the snapshots) is
usable. This nondeterministic behavior is dangerous.- If you create a snapshot of a spanned VMFS volume, an ESX Server host might
reassemble the volume from fragments that belong to different snapshots. This can
corrupt your file system.
Skills and Abilities
Configure storage network segmentation
- FC Zoning
Zoning delivers access control in the SAN, restricting visibility to devices in the zone solely to other members of that zone. It is a common technique used to do things like group ESX servers into production/test/dev, increase security and decrease traffic, among other things.
- iSCSI/NFS VLAN
Storage segmentation for IP storage can be accomplished in one of two ways: VLANs or physical segmentation (i.e. separate layer 2 switches for storage).
Configure LUN masking
The Disk.MaskLUNs parameter should be used when you’re trying to mask specific LUNs to your ESX host. This is a useful option when you don’t want your ESX server to access a particular LUN, but are unwilling (or unable) to configure your FC switch.
To configure LUN masking in the VI Client go to Configuration –> Advanced Settings for the host you want to configure. You’ll find the Disk.MaskLUNs parameter under the section Disk. It looks like this in my VI Client.
Enter a value in the following format … <adapter>:<target>:<comma separated LUN range list>. Be sure to rescan when your done and verify the Mask has been properly applied.
Use esxcfg-advcfg
This one’s easy. Just use the man page (type “man esxcfg-advcfg” at the command prompt). It’ll tell you everything you need to know
Set Resignaturing and Snapshot LUN options
So, following along with the man page above, here is a cut and paste from my server …
[asweemer@cincylab-esx3 config]$ su -
Password:
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/EnableResignature
Value of EnableResignature is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/EnableResignature
Value of EnableResignature is 1
[root@cincylab-esx3 root]#
[root@cincylab-esx3 root]# esxcfg-advcfg -s 0 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 0
[root@cincylab-esx3 root]# esxcfg-advcfg -s 1 /LVM/DisallowSnapshotLun
Value of DisallowSnapshotLun is 1
[root@cincylab-esx3 root]#
Manage RDMs in a replicated environment
RDMs can be created via the CLI with the following command …
vmkfstools -r /vmfs/devices/disks/vmhbaX:Y:Z:0 my-vm.vmdk
By default, the RDM will be created in Virtual Compatibility Mode. But should you need and/or prefer Physical Compatibility Mode, you can change this by editing the VMDK file and changing the createType value to vmfsPassthroughRawDeviceMap.
Use proc nodes to identify driver configuration and options
The proc filesystem is a pseudo filesystem, it’s not “real.” It consumes no storage space and is used to access process information from the kernel. You’ll find quite a bit of valuable data and configuration options in the many subdirectories of /proc/vmware/config. Here’s a quick example from my ESX server …
[asweemer@cincylab-esx3 LVM]$ pwd
/proc/vmware/config/LVM
[asweemer@cincylab-esx3 LVM]$ ls
DisallowSnapshotLun EnableResignature
[asweemer@cincylab-esx3 LVM]$ cat EnableResignature
EnableResignature (Enable Volume Resignaturing) [0-1: default = 0]: 0
[asweemer@cincylab-esx3 LVM]$
Use esxcfg-module
Just like esxcfg-adv, use the man page.
VCDX Admin Exam Notes — Section 1.1
Apr 27th
I finally got a chance to sit down and reformat some of my notes for the VCDX Admin Exam. Below are my notes for Section 1.1 of the VMware Enterprise Administration Exam Blueprint v3.5. Everything in Blue is a direct cut and past from the exam blueprint.
Oh, and thanks to the Disqus comment from VirtualizationTeam (Blog), letting me know that Peter van den Bosch has a more recent version of his VMware Enterprise Administration Exam Study Guide 3.5.
Section 1 – Storage
Objective 1.1 – Create and Administer VMFS datastores using advanced techniques.
Knowledge
Describe how to identify iSCSI, Fibre channel, SATA and NFS configurations using CLI commands and log entries
Here are a few command line examples that I believe would work well …
1) esxcfg-mpath –l
This command produces the following output on my server:
[root@cincylab-esx3 root]# esxcfg-mpath -l
Disk vmhba0:0:0 /dev/sdb (152627MB) has 1 paths and policy of Fixed
Local 0:31.2 vmhba0:0:0 On active preferred
Disk vmhba32:0:0 /dev/sda (152627MB) has 1 paths and policy of Fixed
Local 0:31.2 vmhba32:0:0 On active preferred
Disk vmhba35:0:0 /dev/sdc (923172MB) has 1 paths and policy of Fixed
iScsi sw iqn.1998-01.com.vmware:cincylab-esx3-1d029e5f<->iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1 vmhba35:0:0 On active preferred
2) esxcfg-info –s
The –s flag will narrow the scope of the output to just storage and disk related info. But even with the narrowed scope, this command produces way too much output to be displayed here. You’ll likely want to pipe the output into grep, or at a minimum to a more/less to get what you’re looking for.
3) cat /var/log/vmkernel | grep vmhba | tail –10
This will search the vmkernel log file and display the last 10 lines containing the text vmhba. If you want more (or fewer lines) change the –10 to whatever suits your needs.
If found this one particularly useful when you’ve enabled the software iSCSI initiator at the command line, but don’t know yet number has been assigned to the vmhba (e.g. vmhba35).
4) esxcfg-vmhbadevs –m and ls –lah /vmfs/volumes
The command esxcfg-vmhbadevs –m will show the mapping between vmhba numbers, device files and their UUIDs. If you’d like a quick and easy way to see what UUIDs are mapped to their human readable name, you can follow that up with a ls –lah /vmfs/volumes. The two commands back to back produce the following output on my server:
[root@cincylab-esx3 root]# esxcfg-vmhbadevs -m
vmhba35:0:0:1 /dev/sdc1 4986310d-6525e5e6-ebbd-00237d0681e7
vmhba0:0:0:3 /dev/sdb3 49e115fb-3e22358c-c10a-00237d0681e7
vmhba32:0:0:1 /dev/sda1 4985c53e-e7b1904f-5042-00237d0681e7
[root@cincylab-esx3 root]# ls -lah /vmfs/volumes/
total 10M
drwxr-xr-x 1 root root 512 Apr 20 23:07 .
drwxrwxrwt 1 root root 512 Apr 11 18:12 ..
drwxr-xr-t 1 root root 1.2K Feb 1 21:34 4985c53e-e7b1904f-5042-00237d0681e7
drwxr-xr-t 1 root root 3.7K Apr 14 14:49 4986310d-6525e5e6-ebbd-00237d0681e7
drwxr-xr-t 1 root root 980 Apr 11 18:13 49e115fb-3e22358c-c10a-00237d0681e7
lrwxr-xr-x 1 root root 35 Apr 20 23:07 cincylab-esx3:storage1 -> 4985c53e-e7b1904f-5042-00237d0681e7
lrwxr-xr-x 1 root root 35 Apr 20 23:07 cincylab-esx3:storage2 -> 49e115fb-3e22358c-c10a-00237d0681e7
lrwxr-xr-x 1 root root 35 Apr 20 23:07 vol1 -> 4986310d-6525e5e6-ebbd-00237d0681e7
5) vmkiscsi-ls
This one only applies to iSCSI storage, of course, and produces the following output on my server:
[root@cincylab-esx3 root]# vmkiscsi-ls
*************************************************************
SFNet iSCSI Driver Version … 3.6.3 (27-Jun-2005 )
*************************************************************
TARGET NAME : iqn.2004-08.jp.buffalo:TS-IGLA68-001D7315AA68:vol1
TARGET ALIAS :
HOST NO : 4
BUS NO : 0
TARGET ID : 0
TARGET ADDRESS : 10.10.8.200:3260
SESSION STATUS : ESTABLISHED AT Sun Apr 12 11:35:09 2009
NO. OF PORTALS : 1
PORTAL ADDRESS 1 : 10.10.8.200:3260,1
SESSION ID : ISID 00023d000001 TSIH 1400
*************************************************************
Describe the VMFS file system
There are many subsections here and before digging into each one, check out the following three links …
- Advanced VMFS Configuration and Troubleshooting.
- Really advanced, but really good: Understanding VMFS Volumes
- An oldie but goodie: VMware Virtual Machine File System: Technical Overview and Best Practices
Metadata
The simple definition of Metadata is “data about data.” All file systems handle metadata differently. VMFS uses metadata, stored in a special area of each volume, to manage all the files, directories (in VMFS-3 only), and attributes about the volume. VMFS is a clustered file system, meaning more than one ESX server can access the same file system at the same time. Therefore an update to the metadata requires locking of the LUN using a SCSI reservation.
Multi-access and locking
The following was taking from Advanced VMFS Configuration and Troubleshooting.
Distributed Lock handling by VMFS3
- Done in-band
- Hosts mount a VMFS3 volume
- Hosts’ ids posted to heartbeat region
- Heartbeat records are updated at regular intervals by hosts
- Host X locks a file, the lock is associated with its ID
- If host X dies or loses access to volume the file lock is stale
- Host Z attempts to lock the same file which is locked
- Host Z check the heartbeat record of Host X (~5 times)
- If host X heartbeat record is not updated, Host Z will age the lock
- All other hosts yield to host Z and not attempt to lock the file
- Lock is broken and Host Z acquires the lock
- Journal is replayed by Host Z
Extents
Extents are logical extensions of a file system. They are typically used to grow a volume beyond the VMFS size limitations. Essentially, an extent is the “joining” of two or more volumes into a single, logical VMFS volume.
Tree structure and files
The vmfs partition is mounted to the directory with the corresponding UUID found in /vmfs/volumes. The human readable name of the volume is merely a symbolic link to that directory. By default, all VMs are given a directory at the root of the partition. So, for example, a VM with the name of AaronSweemer would have the directory /vmfs/volume/UUID/AaronSweemer. In this directory you will find all files specific and relevant to that VM. This is the default behavior as some (not all) of these files can be configured to reside elsewhere.
Here is a table of common files found on the VMFS file system.
| Extension | Usage |
| .dsk | VM disk file |
| .vmdk | VM disk file |
| .hlog | VMotion log file |
| .vswp | Virtual swap file |
| .vmss | VM suspend file |
| .vmtd | VM template disk file |
| .vmtx | VM Template configuration file |
| .REDO | Files used when VM is in REDO mode |
| .vmx | VM configuration file |
| .log | VM log file |
| .nvram | Nonvolatile RAM |
Journaling
From Wikipedia …
A journaling file system is a file system that logs changes to a journal (usually a circular log in a dedicated area) before committing them to the main file system. Such file systems are less likely to become corrupted in the event of power failure or system crash.
Explain the process used to align VMFS partitions
The following procedure was found in VMware Enterprise Administration Exam study guide 3.5 (page 5) and Advanced VMFS Configuration and Troubleshooting (slide 36).
Aligned partitions start at 128. If the Start value is 63 (the default), the partition is
not aligned. If you choose not to use the VI Client and create partitions with
vmkfstools, or if you want to align the default installation partition before use, take
the following steps to use fdisk to align a partition manually from the ESX Server
service console:
1. Enter fdisk /dev/sd<x> where <x> is the device suffix.
2. Determine if any VMware VMFS partitions already exist. VMware VMFS
partitions are identified by a partition system ID of fb. Type d to delete to
delete these partitions.
Note: This destroys all data currently residing on the VMware VMFS partitions you
delete.
3. Ensure you back up this data first if you need it.
4. Type n to create a new partition.
5. Type p to create a primary partition.
6. Type 1 to create partition No. 1.
Select the defaults to use the complete disk.
7. Type t to set the partition’s system ID.
8. Type fb to set the partition system ID to fb (VMware VMFS volume).
9. Type x to go into expert mode.
10. Type b to adjust the starting block number.
11. Type 1 to choose partition 1.
12. Type 128 to set it to 128 (the array’s stripe element size).
13. Type w to write label and partition information to disk.
Explain the use cases for round-robin load balancing
Multipathing is typically used for failover. Meaning, if one storage path becomes available the host can failover to an alternate path. However, multipathing can also be used in a round-robin fashion to achieve load balancing to achieve better utilization of the HBAs. There are a couple different configurable options that specify when an ESX server switches paths. From the Round-Robin Load Balancing technical note …
When to switch – Specify that the ESX Server host should attempt a path switch after a specified number of I/O blocks have been issued on a path or after a specified number of read or write commands have been issued on a path. If another path exists that meets the specified path policy for the target, the active path to the target is switched to the new path. The –custom-max-commands and –custom-max-blocks options specify when to switch.
Which target to use – Specify that the next path should be on the preferred target, the most recently used target, or any target. The –custom-target-policy option specifies which target to use.
Which HBA to use – Specify that the next path should be on the preferred HBA, the most recently used HBA, the HBA with the minimum outstanding I/O requests, or any HBA. The –custom-HBA-policy option specifies which HBA to use.
Skills and Abilities
Perform advanced multi-pathing configuration
- Configure multi-pathing policy
- Configure round-robin behavior using command-line tools
- Manage active and inactive paths
- Again, from the Round-Robin Load Balancing technical note …
Setting the Path Switching Policy
You can set the path?switching policy for failover and for load balancing by using the esxcfg-mpath command.You can set the path switching policy on a per?LUN basis by using the esxcfg-mpath command’s –policy custom option. If you specify –policy custom, you must also specify one of the custom policy options. Because the path switching policy is set on a per?LUN basis, you must always specify the LUN using the –lun option.
…
Notes
If you set the custom-max-blocks and custom-max-commands, options, the system attempts to switch paths as soon as one of the limits is reached.
If you set the target or the HBA policy to preferred, the system chooses the target or the HBA of the preferred path when possible. If a preferred policy is set on an active/passive SAN array, and the preferred target is not on the active SP (Storage Processor), the system does not select the preferred target but a target on the active SP.
Path switching is not performed if an outstanding SCSI reservation is on the target, or if a path failover is underway. Path switching is delayed until an I/O request is performed when no reservations or path failovers are pending.
Configure and use NPIV HBAs
<<I don’t have NPIV in my lab. Need to revisit this section>>
Manage VMFS file systems using command-line tools
The command line tool you’ll use for managing VMFS file systems in vmkfstools. It’s a very powerful tool and there are many options available, so I suggest you read the man page. The following examples (taken from the online documentation) are certainly not inclusive, just a quick sample of what the tool can do.
Example for Creating a VMFS File System
vmkfstools -C vmfs3 -b 1m -S my_vmfs /vmfs/devices/disks/vmhba1:3:0:1Example for Extending a VMFS-3 Volume
vmkfstools -Z /vmfs/devices/disks/vmhba0:1:2:1 /vmfs/devices/disks/vmhba1:3:0:1Upgrading a VMFS-2 to VMFS-3
-T –tovmfs3 -x –upgradetype [zeroedthick|eagerzeroedthick|thin]Example for Creating a Virtual Disk
vmkfstools -c 2048m /vmfs/volumes/myVMFS/rh6.2.vmdkExample for Cloning a Virtual Disk
vmkfstools -i /vmfs/volumes/templates/gold-master.vmdk /vmfs/volumes/myVMFS/myOS.vmdk
Configure NFS datastores using command-line tools
Assuming your NAS is configured properly, this is pretty easy. The following command will mount an NFS datastore on an ESX host …
esxcfg-nas –a –o 10.10.8.25 –s /nfs/share NAS
In this example, the –a adds a host with the IP address followed by the –o flag using the share configured after the –s flag. Upon successfully adding the datastore, the NFS mount will be found at /vmfs/volumes/NAS
The following command will remove the datastore
esxcfg-nas –d –o 10.10.8.25 NAS
Configure iSCSI hardware and software initiators using command-line tools
I don’t know if I’ve seen an official, formal example of how to do this (though I’m sure it exists somewhere). So, here’s how I do it …
Step 1: Add the portgroup to vSwitch0
esxcfg-vswitch –add-pg=VMkernel vSwitch0
Step 2: Add the IP to the VMkernel portgroup
esxcfg-vmknic -a -i 10.10.8.202 -n 255.255.255.0 VMkernel
Step 4: Enable iSCSI
esxcfg-swiscsi –e
Step 5: Add the target
vmkiscsi-tool -D -a 10.10.8.200 vmhba34
Step 6: Rescan the HBA
esxcfg-rescan vmhba34
That’s it for section 1.1 … time to go reformat my notes for section 1.2!
Project Minty Fresh (Desktop)
Dec 10th
The fresh flavor that lasts and lasts……that’s goal behind one of my customer’s latest desktop projects. This customer has been working the View3 pre-release code for some time now. Using View Composer, we now have the capability to very easily, and programmatically refresh a user’s desktop back to the original golden master image state. View Composer supports three primary operations after initial linked clone creation:
1: Refresh – A Refresh takes a desktop back to the original state of the master
2: Recompose – A Recompose takes a linked clone and re-homes it if you will, to a new parent image (think instant OS updates or software rollouts)
3: Rebalance – A rebalance takes all the linked clone VM’s in a pool and re-balances them across a set of LUN’s
For the sake of this conversation, we will focus on the Refresh operation.
Problem:
My customer’s goal is to maintain the integrity of their corporate desktop image deployed to users. Over time, their user’s have a particular habit of destroying their desktops. So much so, that they had to put in place a mandatory, ongoing re-imaging program so that all desktops never go more than six months without a re-image. This policy has had some very positive results in terms of reduced help desk calls and time spent just sustaining a rotting OS. That said, the effort required to sustain an perpetual, semi-manual re-imaging program is substantial.
Solution?:
Enter VMware View Refresh. Right now, they have rolled out a program for a set of 50 users to see how well it would work to refresh a user’s desktop much more aggressively (every 5 days to start). This means that after a linked clone is created and the user begins to use the VM, the VM will automatically refresh every 5 days back to it’s original state (configured in the desktop pool settings…screenshot to come). The goal is to make this a highly seamless event for the users. With View Composer’s User Identity Disk, C:\Documents and Settings\ is redirected to another, persistent (thin provisioned) .vmdk that is presented as the D: drive. This is configured when you create the pool as shown below:

Based upon our initial tests, this is working really well. We can refresh a user’s desktop without them ever knowing, as the next time they log on their profile is completely intact. Currently we are testing all of their applications to ensure this will work across the board. I am sure we will find some applications that do not, gasp!, save their preferences in the user’s profile (something TS/Citrix admins deal with constantly). For those applications, our plan is to ThinApp the application and set the User Sandbox to live in the proper, user’s profile directory. We have also found that we need to re-register each VM with the anti-virus console after a refresh operation which we are now achieving through a post-sync script.
I’ll be sure to keep everyone posted on our progress and experiences. It’s certainly something to consider and explore. Let me know what you think! Until then, I wish you a very minty fresh desktop experience!
