Archive for the ‘vTrooper’ Category
– The vTrooper Report –
I was asked a question about a specific use case where a second vCPU should be added to a VM in a Virtualized environment. Generally its an easy answer;
If the server can execute multiple threads and really uses the second vCPU for that other tread
then it’s probably OK to add the second vCPU to the VM
Now adding a vCPU in a server to make it SMP oriented is an elementary task in VMware, but has a few impacts:
- It will change your metrics for reporting
- It changes the HA slot size for your failover needs
- It will modify your consolidation ratio per core and indirectly per socket affecting your Capacity Planning plans
- It will make you redeploy your Ubuntu or Linux server that you forgot to compile with an SMP kernel. (Not to be taken lightly or your server won’t boot)
I was exploring the use case and impacts when a bit of information popped up:
Garbage collection on .NET applications will require a second vCPU to perform in ‘Server Mode’ versus ‘Workstation Mode’
Explaination from MSDN: http://msdn.microsoft.com/en-us/library/bb680014.aspx
Managed code applications that use the server API receive significant benefits from using the server-optimized garbage collector (GC) instead of the default workstation GC.
Workstation is the default GC mode and the only one available on single-processor computers. Workstation GC is hosted in console and Windows Forms applications. It performs full (generation 2) collections concurrently with the running program, thereby minimizing latency. This mode is useful for client applications, where perceived performance is usually more important than raw throughput.
The server GC is available only on multiprocessor computers. It creates a separate managed heap and thread for each processor and performs collections in parallel. During collection, all managed threads are paused (threads running native code are paused only when the native call returns). In this way, the server GC mode maximizes throughput (the number of requests per second) and improves performance as the number of processors increases. Performance especially shines on computers with four or more processors.
This caught me by surprise and makes me think; for every disk of VM’s around the world which are out of whack (mis-aligned) , there are an equal number of .NET app servers that have been virtualized with P2V tools across the globe that are starved for the correct garbage collection mechanism….
OH, THE HUMANITY !!
Whoa! I gotta settle down!
Ok
Now you are going to ask where the special override switch or Regedit value would be used to fix it. The answer is even more easy. There isn’t one. .NET sees one vCPU or more and decides for the app. You cannot override it. You can add 4 vCPU’s to improve its performance but not turn it off.
This probably explains a few of the things that have already happened or will happen in your app development world:
- The .NET development on dual core workstations is working fine and when you move the application to a single core VM the development process hits a hiccup in performance while GC runs.
- VM admins who have been adverse to the second vCPU that was idle now have a reason to deploy a second vCPU but won’t like it.
- It drives a reason to migrate to vSphere sooner than later due to the relaxed CPU scheduler that was introduced in 4.0
- Additional vCPU’s will drive more ‘Eggs’ into your baskets – Do Not Panic
As all the worlds workloads increase its only inevitable that the number of vCPU’s would increase as well. The push for 64-bit systems with over 4GB of RAM are driving up the size of the VM in most farms as seen by the new maxims in the VMware vSphere release. Just remember that you can look for vCPU contention and NUMA pressure in the ESXTOP values.
There are plenty of announcements to talk about from the second half of Day 1 and Day 2 of EMCWorld. Keynotes from Pat Gelsinger, Brian Gallagher,and Rich Napolitano all brought the “Why, What, and How” the datacenter will change over the next year. With the quick reference to VPLEX by Joe Tucci in the morning Keynote of DAY 1, the official announcement and demo of the technology by Pat Gelsinger in the afternoon keynote explained the VPLEX technology and its purpose in todays datacenters.
– vTrooper Report — from EMC World
My self-imposed gag order has been lifted since my arrival to EMC.
I’m ready to share some of the great news from EMC World 2010. Stay tuned for all the announcements in detail but for now:
It’s my first time at EMC World. I’m an infrastructure guy at heart and have been to Cisco World and VM World a few times, but not the big storage show. That’s good because EMC is not just a storage company. My “Firehose” treatment over the past few weeks made certain of that fact. So I’ll warm up to EMC World with a little mood and follow it with content. All good fun starts with a party:
– The vTrooper Report –
This is a continuation of the Capacity Conundrum, if you missed the first part start here.
$ per Compute VM
So let’s cut to the chase. In the case of the compute tiles of our Quad we have a price per vCPU and $ per GB of RAM to settle. Keeping our example 2U server in play we could expect to spend approximately $15,000 for a 2U fully loaded with 4GB DIMMs. Well unfortunately a small part of that 15K is consumed in I/O cards and maintenance which needs to be pulled out to get the compute number. For our argument we will use $10K for the compute system without the I/O cards and maint. costs; This is the CAPEX we will offset in our $/per values.
vCOMPUTE – FIREPOWAH!
We know how a CPU works right? Move process into memory , execute CPU cycles, churn, churn more, back to the I/O guys, rinse and repeat. Basically, this is where the hardware container happens in our data centers. I say container because it’s easy to show it as a box; It’s hard to define what it will always be in physical form. 1U, 2U, 4U, half blade, Full Blade, appliance , PC ; you name it, it is probably in some one’s ‘datacenter’. The lowest common denominator I have been able to settle on for a common form factor is Cores per Ram. Grouping per socket fits because you are measuring the type of memory that is close to the CPU socket. The NUMA architectures of AMD and Intel with memory controllers on-board and transports to the memory DIMMs without access through the I/O controllers (eg. Northbridge) help define the grouping.
TECHNOTE: Every core has associated memory banks it will use and every container(physical server) has a series of sockets that it controls. A hypervisor has a limit to how well it can control the associated memory space to the nearest vCPU. Generally the hypervisor will always schedule available vCPU’s from the same socket and swap the corresponding memory for those processes to the memory banks of the corresponding socket. It does this is for efficiencies of the x86 architectures. It can move the vm to another socket and readdress the memory but it has a ‘cost’ associated with such a move. Path of least resistance is to stay in the same socket.
If you create a 4 vCPU VM and run it on a 1 core processor it gets bogged down. If you have the same VM on a two socket Quad Core (8 Cores) the four cores utilized by the VM are likely to be on socket 1 or socket 2 . The cost of splitting the vCPU between the two physical sockets by the scheduler is greater than running the vCPU in the same socket. AMD delivered this earlier than Intel and sustained higher levels of virtualization consolidation “Per Host” than similar class systems of Intel could provide through the Northbridge. Core i7 is a new game for Intel and the results of Nehalem show the improvements.
For more indepth information here is a good read: CPU Scheduler in VMware ESX
We have a host of $10K CapX charge that has two sockets at a 4/45GB Socket Ratio with approx $5k spend in each socket. Looking at our Hardware invoice the CPU Cores are about 25% of the cost of a socket so we can assume that our per socket cost is broken down into 25% Core and 75% Memory. So our Socket Ratio yields a $1250 cost for 4 cores and $3750 for 45GB of memory:
Per Core CapX = $312.50; Per GB RAM CapX = $83.33
That gives us a bare metal cost without a hypervisor charge on top, but we need a hypervisor to get a VM running. Adding in the ESX cost for a per socket license of ESX Enterprise Plus (worst case) you can add $3500 each socket.
ESX Lic. Cost per socket CapX = $3500
Raw burn rate of the host would be $8500 per socket if we never loaded a VM on the Host. Well, we did it for a reason, so let’s get our money back. If we target the standard allocation for this host (4/45GB socket ratio) we get our target VM count of 16 per socket(1 vCPU/2.8 GB RAM). Also, keep in mind that we broke the socket cost down by 25% to CPU and 75% to Memory so we will keep that same split here. If we don’t do the split, then any VM that is deployed to the socket will bear the same cost regardless of its size.
ESX Lic. Cost per VM= $218.75 ( 3500 / 16 )
-Or-
Split by the 25/75 % we did previously for the cost of the CPU and Memory and you get a little different calculation.
3500 * .25 = 875 / 16 = $55 AND 3500 *.75 = 2625 / 45 = $58
per vCPU=$55
per vMEM=$58
Adding it up with our target ratios in tow we get the burn rate of the $ per Compute on a VM basis.
($312/4 = 4:1 ratio) + (83*2.8) + {(55*1) + (58 * 2.8) } = $530
Or Summarized: (vCPU = $78)+(vMEM = $233)+(Hyper$=219)=(vCOMPUTE = $530)
Assuming 8760 Hours (1 year) this VM would cost $.06/hr in vCOMPUTE.
Lets apply that to some other VM systems and see if it sticks. If we plan for the following VM deployment on our socket:

The costs would spit out as such:

Or slice it up into a per hour number:

So based on this analysis some of my VM’s probably only cost $.05 per hour for vCompute. Interesting. What is more interesting is the fact that the memory cost associated with a VM scales more accurately to the consumption. You can have as much memory you like for your new 4 and 8 GB aspirations; (eg. memory leaks) you just need to pay for it accordingly.
Too bad that only pays for the top part of my total cost model. That said, the benefit here is that this model can span across hypervisors and any market hypervisor can be split up to show the cost of a VM consumed on a Xen , KVM, VirtualIron, Parallels’, or Hyper-V infrastructure.
I will be working on a few powershell scripts and excel calculators that one can use to make this model more repeatable. At the very least, it is a model that I will use to consider CapacityIQ and third party products like the offering from VKernel; and the output they measure. Especially if they consume additional costs on a per socket basis. Which I can now calculate as Overhead.
Alas there is more to consider, stay tuned for Part III – “the I/O that binds”
–The vTrooper Report –
In an effort to gel up an internal billing and allocation model (GaaS – Goughing as a Service) I’ve been struggling with the concept of cost per vm. I was asking a simple question in the twittersphere about that idea and it turned into a discussion and well…got out of hand. I apologize for that, as this is a better format to explain. (Special thanks to @asweemer for a dumping ground)
If I had a Nickel for each VM…
At VMWorld 2009 there was a presentation in the keynote that showed the price of a vm hosted with Terramark that was $.05 per hour. I thought wow. A nickel per hour. If I had a nickel per vm/hour; How much would I have available to spend on coffee?
Then I thought wait. I have VM’s. How much do they really cost me per hour? Well the answer is … it depends. Old servers with high power consumption and low density versus a new system with Intel 5500’s and packed in blades have different burn rates visible to different systems(power, cooling, depreciation). I haven’t found a great model to break those units down to my satisfaction yet. I need another way.
As a general practice I create in my mind some maxims that I follow in the creation of a VM.
- S – 1vCPU, 1GB Ram, 1GB Net, 10GB Disk
- M -2vCPU, 2GB Ram, 1GB Net, 20GB Disk
- L – 4vCPU, 4GB Ram, 2GB Net, 40GB Disk
Seems simple enough, but it doesn’ t really generate a cost model on a consistant basis. Hardware continues to change and each VM that consumes resources does it at different rates and times of the day. A VM that isn’t doing anything isn’t really ‘consuming’ anything, right? I thought I would try break it down further by creating a 4 quadrant block with two macro categories: Compute (CPU and Memory) and I/O (Network and Disk)
Each resource area could increase\decrease for a reason without changing the size of the original maxim it was created under. This allows for small variations of size without having a customer yell that their bill when up by $2 this month.
The Measureable Unit
Use a unit of measure to identify the four quadrants: vCPU : vMEM : vNET : vDISK or C:M:N:D . Then overlay the VM creation to count up the units. This way the growth of a ‘VM’ during its lifecycle can be adequately allocated back into the proper IT metric. Using the VM creation maxims up above this may be:
- S – 1:1:1:1
- M -2:2:1:2
- L – 4:4:2:4
This isn’t perfect but it at least allows for the average cpu cost to be allocated seperately from a memory, network, and disk cost. Afterall, you don’t get to upgrade all four parts of the quadrant in the same fiscal year usually. This also allows a way to trend an average of your cost rate per unit over a period of months and years to see which cost areas are improving. It is an interesting metric for the business and IT. Win-Win in my book. Even if no-one internally ever has to pay the values back (Showback). It also helps police which VM is consuming too much of a specific value which would skew the numbers if you simply took the cost of the esx hosts and divide by the number of VM’s.
Apples , Oranges, Lemons, and Grapes = Frutti Results
So you have a unit of measure and a type of system to match the measurement up towards over a period of time. Here’s where the fruit cart and the horse get hooked up.
This is all very complex, why can’t I just buy the same server I have purchased for the last 5 years?
Sorry Kids. They don’t build’em like they used to. But in todays market, the UCS system from Cisco has a new buzz to the original players of IBM, HP, and Dell. How do you sort any of that out among the offerings, and how do you select the right platform for your new ESX System? By the Socket ! Every system of the x86 family has them from both the Intel and AMD families. And now that you have to pay for your hypervisor and additional tools (Capacity IQ, AppSpeed, Nexus1000v) per socket it matters more. I need to squeeze the value out of those sockets.
Still staying in the upper half of the Quad; lets measure cores and RAM as a ratio assuming dual rank 4GB Dimms and measure them to some of the standard 2 socket servers.
Standard Intel x5450
2 Socket – 4 Core – 16 Dimms (8 per socket) produces 4 cores/ 32 GB Ram
Standard Intel Nehalem x5500
2 Socket – 4 Core – 18 Dimms (9 per socket) produces 4 cores/ 45 GB Ram
Cisco UCS extention on x5500
2 Socket – 4 Core – 48 Dimms (24 per socket) produces 4 cores/ 96 GB ram
What this shows is that for every license of ESX consumed in the environment there are different amounts of memory available for a VM to use. The approach by the UCS system allows for a much higher allowance of memory to a VM at the same licensing cost. Sure you could buy 4 way servers and claim that the 256 GB of RAM gives the VM more allowance but in reality the vm will have ratios of contention to the vCPU and Memory within each of the 4 sockets. You can change the size of the container by moving to a 4 way, but it won’t change the value of the ratio for that container in regards to the cores and memory.
CPU Contention
The idea of CPU contention is becoming more visible to most administrators of virtualized environments because the desire to pack the vm’s onto a host is so strong. If I can get 10 VM’s on a host for $5000 then getting 25 VM’s on the same host is lowering my cost per vm. It could also be cheating your customers of the performance they paid. Especially if you have multiple vCPU’s assigned to those 25 VM’s. This is where the ratio of VM per host becomes obsolete and vCPU/core makes more sense.
Using the example containers above you can generate an expected number of VM’s per socket. There is no reason to do a 1:1 ratio of cores to VM because the point of virtualization is to run more with less. I think a good ratio to start with is 4:1 for a production VM and 16:1 for a VDI implementation:
Standard Intel x5450 - (4 /32 GB SocketRatio) yields 16 VM’s with a 1 vCPU/ 2GB ram configuration per socket
Standard Intel Nehalem x5500 - (4 /45GB SocketRatio) yields 16 VM’s with a 1 vCPU/ 2.8GB ram configuration per socket
Cisco UCS - (4 /96GB SocketRatio) yields 16 VM’s with a 1 vCPU/ 6GB ram configuration per socket
You can always adjust your actual deployment if these ratios don’t match up for your environment. The expected deployment number helps determine how large the pizza slices are for the team. Not how many slices each of them consume. In these configurations you can see where the density of the RAM per socket (SocketRatio) of the UCS allows for much larger VM configurations before overcommitment. A nice fit for the new 64bit installations. These expected numbers of VM per socket help determine what the burn rate of a C:M:N:D value is for the CapX spend you made.
BurnRate
To fully understand how much a VM costs, one has to look at what was spent in the CapX of the host and agree on the measuring stick to measure the C:M:N:D value of the created VM. If a series of hosts are in service from different families and are at different parts of lifecycle there may have to be some averaging. The SocketRatio of Cores/RAM is a consistent way to measure systems from different form factors and families and levelset the expected allocation of VM’s. The expected allocation of VM’s for a host helps determine what density ratio is desired for vCPU:vMEM.
This is the end of Part 1 – In Part Deux I will take a deeper dive into the Compute and I/O areas and assign a more detail cost per VM model.
