You Down With NTP?

(Yea! You know me!)

image

NTP. How can I explain it? I’ll take you frame by frame it.

I’m sure that no readers of Virtual Insanity would ever neglect to setup NTP properly on every single ESXi host.  But occasionally, our NTP source hiccups, or something happens to skew the time.  Recently I found a host with the NTP service stopped.

image

Why?  No idea really.  Maybe it was stopped while someone was troubleshooting.  Maybe it just crashed.  But it will cause issues with backups, and with applications running during backups or vMotions.

When a snapshot is taken, or a VM is vMotioned, the time is sync’d inside the guest by default.  This can be a problem if your host NTP time is off.  All my guests use Active Directory for NTP, and the Linux guests use an AD domain controller for NTP, so I do not rely on guest time syncing up to my ESXi hosts.  Or so I thought. . .

Even if you have your guests configured NOT to do periodic time syncs with VMware Tools, it will still force NTP to sync to the host on snapshot operations, suspend/resume, or vMotion.  There is a way to prevent VMware Tools from syncing the time for these events, but it’s better just to make sure NTP is up and running, and getting the correct time.  There is a clear reason VMware insists on doing these sync’s during times when I/O is quiesced, or transferred to another host.  Timekeeping in a hypervisor environment when you’re sharing CPU cycles is no trivial task.

If you use a backup solution that snapshots the VM, VSS quiesces the I/O inside that guest.  When it does, there’s a VSS timeout for a snapshot to complete.  If the time is exceeded by the snapshot, VSS will timeout, and your job will fail with error code 4 quiesce aborted.

image

By default, this timeout is set to 10 mins on Windows guests.  Of course, my time was off on the ESXi host by 12 minutes, so when the backup job started, VSS kicked off, and then VMware Tools sync’d the time 12 minutes forward.  VSS times out instantly.  If you see this error code on your backups, an easy thing to check first is NTP.

image

I recommend setting the NTP service to start and stop automatically.

image

Previously, I had set this to start and stop with the host.  But if something happens, and it stops, or gets stopped for some reason, it will not restart until the host restarts.

So who’s down with NTP?

Hopefully all the homies. . .

Upgrading SRM from 5.0 to 5.5

If you’re one of those shops that skipped over 5.1, and are now catching up and going to 5.5, you will run into problems with your SRM upgrade.  Here’s how to fix them.

After you perform the upgrade, per the VMware Site Recovery Manager Installation Guide, you may run into permissions issues when launching the SRM plugin.

The error message is: Connection Error: Lost connection to SRM Server x.x.x.x:8095  -  Permission to perform this operation was denied.

image

To fix this, you’ll have to go into the database, and do some editing.

First, stop the VMware Site Recovery Manager service.

Connect to the SRM database server.  If you’re not using SQL, adjust these procedures accordingly.  This is SQL specific.

First, make sure you BACKUP your database.

Under your SRM Database, you’ll see a table called dbo.pd_acedata. Right click that table, and Select Top 1000 rows.

image

In your Results window, you’ll see that the only permissions that exist are old school pre-SSO “Administrators”.  We need to fix that.

To fix it, we’re going to delete that row. Right click the table again, and select Edit Top 200 Rows.

image

Now, select that row with the old Administrators permission, and right click to delete it.

image

Click yes.

image

Now we have to do the same thing to the dbo.pd_authorization table. Edit the first 200 rows.

image

Delete the first line that says DR_ACEData, and click yes again.

Now go start the SRM service.  This will automatically populate your database with the new permissions you’ll need to launch the SRM plugin, and connect. 

If you go back to the table, you can see it has the correct permissions.

image

For some reason, this is a known issue, but the KB is not public.  So here’s your KB.

Don’t forget to backup your database, so you can restore if you blow it up.  Happy upgrading.

Feedback is old and busted. Feed Forward is the new hotness.

We’ve all been there. You just sat through a terribly boring presentation that could have been so much better. If only the feedback form you’re about to fill out had made it to the presenter yesterday.

image

Since most of us spend all our free time on IT, and virtually none on quantum physics, there’s no way we can accomplish that kind of preemptive feedback.  Sure, you can run through a presentation with the wife, or with your Uncle Si.  But they’re not able to give you the kind of feedback you really need to make your presentation a huge hit. If only you had that time machine.

Apparently Duncan Epping, Scott Lowe, and Mike Laverick have been studying physics in their spare time, because they have come up with a solution.  It’s called Feed Forward, and it’s about to take off in a major way with the VMUG organization around the globe.

What exactly is Feed Forward?  It’s a program where a potential presenter can get help, and feedback from pros before giving a presentation. The program is just getting off the ground, but some of the early experiences have been great.  I believe this program will be a way to get more, and better content to VMUG’s. A lot of people who have experiences, or relevant expertise to share, are reluctant to step up. This program would allow them to pitch their presentation at others without risk, and get feedback that helps them understand how their presentation would benefit the group.

As an ardent supporter of the National Forensics League, I believe strongly that public speaking, and empirical presentation skills are invaluable. Unfortunately, not everyone has access to programs like this growing up. Too often, people reach a point in their career where their inability to present holds them back. We all need to be able to present, because in the end, we are all salespeople. Whether we’re selling the boss on a new idea, or simply selling ourselves to a perspective employer, practice helps tremendously.

Feed Forward is one way to get that practice, and get some constructive feedback from people who are adept at presenting, and understand the subject matter that is most relevant to your perspective audience.

I am not sure if this will be limited to VMUG presentations in the long run, or if it will expand beyond that to VMworld, and even presentations for other groups. But I have to say, I am on board 100%, and I strongly encourage our readers to sign up here to stay abreast of Feed Forward developments. Also, if you have ideas or comments on Feed Forward, I am sure the guys would love to hear them.

Up Close and Personal With IBM PureApplication PaaS

The converged infrastructure value proposition, by now, is pretty evident to everyone in the industry. Whether that proposition can be realized, is highly dependent on your particular organization, and specific use case.

Over the past several months, I have had an opportunity to be involved with a very high-profile pilot, with immovable, over-the-top deadlines.  In addition, the security requirements were downright oppressive, and necessitated a completely isolated, separate environment. Multi-tenancy was not an option.

With all this in mind, a pre-built, converged infrastructure package became the obvious choice. Since the solution would be built upon a suite of IBM software, they pitched their new PureApplication system. My first reaction was to look at it as an obvious IBM competitor to the venerable vBlock. But I quickly dismissed that, as I learned more.

The PureApplication platform is quite a bit more than a vBlock competitor. It leverages IBM’s services expertise to provide a giant catalog of pre-configured multi-tiered applications that have been essentially captured, and turned into what IBM calls a “pattern”. The simplest way I can think of to describe a pattern is like the application blueprint that Aaron Sweemer was talking about a few months back. The pattern consists of all tiers of an application, which are deployed and configured simultaneously, and on-demand.

As an example, if one needs a message broker app, there’s a pattern for it. After it is deployed (usually within 20-30 mins.), what’s sitting there is a DataPower appliance, web services, message broker, and database. It’s all configured, and ready to run. Once you load up your specific BAR files, and configure the specifics of how inbound connections and messages will be handled, you can patternize all that with script packages, so that next time you deploy, you’re ready to process messages in 20 minutes.  If you want to create your own patterns, there’s a pretty simple drag and drop interface for doing so.

image

I know what you’re thinking. . . There are plenty of other ways to capture images, vApps, etc. to make application deployment fast. But what PureApp brings to the table is the (and I hate using this phrase) best-practices from IBM’s years of consulting and building these solutions for thousands of customers. There’s no ground-up installation of each tier, with the tedious hours of configuration, and the cost associated with those hours. That’s what you are paying for when you buy PureApp.

Don’t have anyone in house with years of experience deploying SugarCRM, Business Intelligence, Message Broker, SAP, or BPM from the ground up? No problem. There are patterns for all of them. There are hundreds of patterns so far, and many more are in the pipeline from a growing list of global partners.

The PureApplication platform uses IBM blades, IBM switching, and IBM V7000 storage. The hypervisor is VMware, and they even run vCenter. Problem is, you can’t access vCenter, or install any add-on features. They’ve written their own algorithms for HA, and some of the other things that you’d expect vCenter to handle. The reasoning for this, ostensibly, is so they can support other hypervisors in the future.

For someone accustomed to running VMware and vCenter, it can be quite difficult to get your head around having NO access to the hosts, or vCenter to do any troubleshooting, monitoring, or configuration. But the IBM answer is, this is supposed to be a cloud in a box, and the underlying infrastructure is irrelevant. Still, going from a provider mentality, to an infrastructure consumer one, is a difficult transition, and one that I am still struggling with personally.

The way licensing is handled on this system is, you can use all the licenses for Message Broker, DB2, Red Hat, and the other IBM software pieces that you can possibly consume with the box. It’s a smart way to implement licensing.  You’re never going to be able to run more licenses than you “pay for” with the finite resources included with each system. It’s extremely convenient for the end user, as there is no need to keep up with licensing for the patternized software.

Access to the PureApp platform is via the PureApp console, or CLI. It’s a good interface, but it’s also definitely a 1.x interface. There is very extensive scripting support for adding to patterns, and individual virtual machines. There are also multi-tenancy capabilities by creating multiple “cloud groups” to carve up resources.  There are things that need to be improved, like refresh, and access to more in-depth monitoring of the system.  Having said that, even in the past six months, the improvements made have been quite significant.  IBM is obviously throwing incredible amounts of resources at this platform. Deploying patterns is quite easy, and there is an IBM Image Capture pattern that will hook into existing ESXi hosts to pull off VM’s to use in Pure, and prepare them for patternization.

SNAGHTMLf9ba738

Having used the platform for a while now, I like it more every day. A couple weeks ago, we were able to press a single button, and upgrade firmware on the switches, blades, ESXi, and the v7000 storage with no input from us. My biggest complaint so far is that I have no access to vCenter to install things like vShield, backup software, monitoring software, etc.. But again, it’s just getting used to a new paradigm that’s hard for me.  IBM does have a monitoring pattern that deploys Tivoli, which helps with monitoring, but it’s one more thing to learn and administer. That said, I do understand why they don’t want people looking into the guts on a true PaaS.

Overall, I can say that I am impressed with the amount of work that has gone into building the PureApplication platform, and am looking forward to the features they have in the pipeline. The support has been great so far as well, but I do hope the support organization can keep up with the exponential sales growth. I have a feeling there will be plenty more growth in 2014.

If it sounds too good to be true, it can still be true – PernixData FVP

 

If you said your mission was to “bring scale out storage to every virtualized datacenter”, you can bet that your Twitter follower count would drop immediately, and your peers would think you had gone off the deep end.  That is, unless your name was Satyam Vaghani, and you had already invented VMFS, brought VAAI to fruition, and helped introduced the concept of VVOL’s at VMware.  If you were Satyam, and you said that, people would just throw money at you.

 

Fast-forward to today, and that money has been poured into a startup called PernixData, which is about to unleash its Flash Virtualization Platform (FVP) onto the world.  I hear you groaning over there.  ”Ohh, not another flash startup. . .”  Aren’t there tons of flash startups these days, all promising to revolutionize storage, and handle flash “like no one else”?  Indeed.  But you’re going to want to pay attention to this one.

 

There is no debate anymore about whether flash should be implemented in the datacenter.  The more interesting debate happens when we talk about about HOW to implement that flash.  You have several choices when it comes to using flash to bring storage performance to your applications.  You can buy some flash from your traditional storage vendor at breathtaking markups, and watch them shoehorn that into an existing, legacy array.  But then you have to live with paying a huge premium for performance you’ll never see.  Legacy arrays weren’t built with flash in mind, and they all will very quickly reach their limit when you start adding flash.  While you will no doubt see a performance boost, it doesn’t scale.  A few months later, after adding more hosts, and more VM’s, you will inevitably hit a wall again.  Then what?

 

You can go to one of the recent startups, like PureStorage, and get a nice array of SSD’s with great features, and replace your current storage array for close to the same price as spinning disks.  You can go to XtremIO (now owned by EMC), or NetApp, and buy one of their flash arrays, and either may suit your needs just fine.  Violin or Nimbus will gladly sell you an all flash array, but as you will see, there are some drawbacks to these approaches.

 

Vaghani believes that SAN attached storage is too far from the application.  Consider that a piece of flash can process an I/O in less than 100 microseconds.  Would any rational person want to add 400-800% latency on top of that, just to traverse a network?  There is a valid reason for doing so, and that is so you don’t have to change your existing data storage strategy.

 

If you’re using EMC already, and you want speed, buying some XTrem, and tiering it with your VMAX  is not a bad decision.  You use all the same tools, and the same methodology for storing, and protecting your data, as you you already use.  No need to learn another storage operating platform, or change your data protection strategy, just to get a speed boost.  But with that particular product, you’re only accelerating your reads.

 

Per Vaghani’s theory, and just basic physics, to get the best performance, flash should be as close to the application as possible.  So you could just go out and buy some SSD’s, or PCIe flash cards, and pop them right into the server.  That way, you are certain to get every technologically possible IOP out of your new, expensive flash.  Then there is that pesky problem of trying to figure out how to use this new storage without having to re-architect the way data gets protected and stored.  Often times, this means making changes to your applications, which should be loads of fun, and super-easy.
So what do you do?  Create a new VMDK on the new flash?  How do you protect it?  If it’s local, how do you vMotion?  Uggh. . .

 

If only there was a way to change the performance of the existing storage infrastructure, without changing. . . the existing storage infrastructure.  Introducing PernixData FVP.

 


FVP takes whatever existing flash you have inside your VMware host, and uses it to accelerate I/O from your back end storage arrays.  It integrates seamlessly into the VMware kernel, and aggregates flash storage from all hosts in a cluster into a single pool of flash resources.  FVP then hijacks the I/O from a VM before it goes out to the storage, and decides whether that I/O should be served from the pool of flash, or by the storage array behind the flash.  So it’s native to the hypervisor, and can be used to accelerate VM’s on a per VM or per VMDK basis. 

 

Although FVP has a clustered file system, it does not have any centralized management or metadata functions.  All nodes are autonomous, and do not need to communicate to other nodes to declare ownership of a block, or for any other reason.  This means clustering does not have an impact on the level of performance you will see from your flash devices.  Other clustered flash solutions on the market have some piece of their management functions centralized, so all hosts must communicate with the central authority, over that slow network we were talking about earlier, resulting in the same type of latency we would see using SAN-based flash.  Essentially what we have is what PernixData calls a Data-In-Motion tier, or as Enrico Signoretti says, a CAN (cache area network).

 

FVP offers both write-through, and write-back modes.  Write-through means FVP will not intercept, or accelerate write operations.  It passes those on to the storage array where the VMDK is housed.  In write-back mode, writes will be accelerated, and distributed to other nodes in the pool for data protection.  Reads are accelerated in both modes, and all this is completely configurable per VM.  You can select whether you want to have up to 2 additional replicas elsewhere in the cluster, or no replicas to completely maximize performance on a per VM basis.  The amount of capacity used in the flash pool is configurable by VM as well.  This level of flexibility is unmatched, anywhere in the market.

 

While there are one or two products out there offering write-back capability, they seem to ignore the fact that the virtual environment is highly dynamic, and VM’s frequently relocate to different hosts.  You didn’t think a former VMware guy would ignore vMotion, did you?

 

Once a VM vMotions to another host, the warm data residing in cache on the old host is compared to the data in cache on the VM’s new host.  Data that does not exist on the new host is transferred over the vMotion network, to the new host.  There may be a slight performance impact while the data is transferred, but the impact is minimal, compared to warming cache all over again.  Also, the severity of the impact is limited to the original, pre-FVP response time of your back end storage array.  So for a few seconds, you can reminisce about how slow things were back in the day, before you got FVP.  In the demo video below from Storage Field Day 3, performance started ramping almost immediately, as the data was being copied over.

 

In addition to vMotion, FVP also supports Storage DRS, HA, SIOC, VADP, snapshots, and VAAI.  It doesn’t get into the way of anything you are currently doing.  In fact, it is so transparent, you can actually yank out your SSD’s on a host WHILE a VM is running, and nothing will happen.  Try that with a LUN.

 

FVP is a truly comprehensive solution for VMware customers, and can be deployed in minutes, with no changes whatsoever to your infrastructure, or the way you currently handle data.  Simply add any flash, and run VMware Update Manager, and in minutes, you are reaping the benefits.  While PernixData does plan to support more than just VMware in the future, it was an obvious decision to start there.  Without exception, all vendors presenting at SFD3 mentioned VMware as being best-in-class, and I am inclined to agree.

 

Check out the vids below for some performance numbers, and a demo.  In addition to being a passionate, technically brilliant advocate for his product, Satyam could also moonlight as a stand-up comedian, so even though I am not in the videos, you will not be bored.  Apparently at dinner the night before, the Dutch Storage Syndicate (Arjan, Ilja, Marco, and Roy) poisoned me in retaliation for Justin Bieber’s disrespect of Anne Frank, so I had to watch from upstairs.  And I’m not even Canadian!

 

 

Although Gestalt IT covers delegate’s expenses for Field Days, delegates are not obligated to write, review, or produce content on any of the products or vendors we see at these events.

One Blox To Rule Them All. . .

 

 

This past week, I had the good fortune of attending Storage Field Day 3 (SFD) in Denver, Colorado. Field Days are events where a group of independent IT professionals chosen by a committee of like-minded people are brought together with start-ups, and mature companies who have new, innovative products. These companies all have something to say, and most are interested in direct, independent feedback on what they’re offering. The discussions are two-way, and are usually quite a bit more in-depth, and candid than one might expect during most vendor briefings.

Something was abundantly clear to me during all the vendor meetings at SFD, the industry is being disrupted in a huge way. If you’re stuck in the storage world as it was in 2010 – 2011, you are lost. But have no fear. If you’re willing to move beyond what is rapidly being considered “legacy” storage, you can catch up. I’m here to help.

One of the most exciting moments at SFD was the launch of a brand new storage company called Exablox. These guys are way out there in front of any other company providing storage solutions for midsized companies. Imagine object based, content addressable storage with an enterprise feature set, at a price for small to midsized businesses.

If you’re into storage at all, you’re probably already aware of the advantages of object-based storage.  For those who may be less familiar, object storage is differentiated from traditional file-system storage by ditching the traditional hierarchy, and storing files as independent objects. There are a couple of major advantages to using this method to store files.

  1. Scalability – Amazon’s S3 currently stores 1.3 trillion objects, and adds over 1 billion new objects daily. The sheer amount of overhead that would be required to store that amount in one, or more file systems would be staggering. Keeping track of all the pointers, metadata, and ensuring the integrity of the file systems would require processing power that would be cost prohibitive. Amazon is able to deliver S3 at a compelling price point, as a result of this scalability.
  2. Reliability – When there is no requirement to grow, prune, and maintain the integrity of a gigantic file system, reliability inherently increases. Additionally, the fact that our objects can be located essentially anywhere, and retrieved with a simple object ID, means that we can break a file into as many atomic chunks as we would like, and even distribute it geographically, like S3. When we break these files up into atomic units, we can determine how much data protection, or parity, we want to assign. Most of the object storage out there will allow us to tailor how many of these chunks we can lose, and still rebuild the entire file. So if we have some important data that’s really critical, we can split it into enough chunks, with enough parity, to tolerate failures on a scale that would completely devastate RAID protected file systems.

While this is admittedly a quite simplified, and possibly flawed introduction to the concept, it should be enough to get us to a point where we can understand some of what makes the Exablox product unique, and groundbreaking. The product is called OneBlox. It’s designed to give the advantages of object storage to businesses who don’t have an army of storage guys who are dedicated to the task.  OneBlox is the brainchild of CTO and Co-founder Tad Hunt, who was very adept at explaining all the in’s and out’s of how these boxes work together to create a storage system that is punching far above its weight class.

Normally, to use object storage, one needs to architect an application to use objects, as opposed to a traditional file system. That’s probably not going to be something a midsized customer would want to do up front, considering the investment. OneBlox gets around this requirement by providing a CIFS/SMB interface. You can use this thing like any traditional CIFS/SMB target, integrate it with Active Directory, and still get all the benefits of object storage, right out of the box.

No. . .seriously. . .I’m not talking out of the box, after hours of frustration. I mean, seconds after it boots up, you have storage space available to dump files onto from Windows, or any CIFS compatible OS.

The one thing that seemed to divide some of the delegates was the management system.  Called OneSystem, it’s cloud-based, and comes with the unit.  As soon as a customer receives a OneBlox, and boots it up, it presents storage instantly, and also calls home to OneSystem.  Of course this assumes that a firewall is allowing it to call home.  Once you pull up the site, it’s as easy as pairing a device to Netflix.  You just punch in the 5 digit code on the front of the OneBlox, and that device joins your ring.

 

 

If you get another device later, just plug it in, do the same, and bam.  It joins the ring.  The OneSystem management interface is really simple, and clean.

Some SFD delegates questioned Exablox’s decision to make the only management interface for the product one that was cloud-based. From my perspective, I think it’s perfect for the market they are targeting, and it also enables them to come in at an amazingly attractive price point per unit, while selling the management separately.

The OneBlox is packed with features.  In addition to SMB/CIFS, it does real-time replication, dedupe, and encryption by default, and even CDP-like snapshots!  Users can access the snapshots directly within the file structure in their Windows Explorer, or Mac Finder window.

 

 

Each OneBlox can support up to 32TB RAW of any type of disk from any manufacturer, although as Lauren Malhoit points out, the system doesn’t do any tiering, so it’s not setup to put hot data on SSD’s or anything like that.  It’s very Drobo-like in its simplicity, complete with red or green LED’s to tell you the status of the drives at a glance.  The chassis is a work of art, and is not some off the shelf rebadged 2U server.  The feet actually slide into place on the unit below for stackability without a rack.  The whole chassis has a solid, hi-fi component feel to it, even though it’s cheaper than many hi-fi components.

 

 

I could write a dozen more pages on the inner complexities of how this thing works (the ring), and how amazing it is, but honestly, Tad does such a good job explaining it, I’m going to link to the whiteboard and let him show you.

 

 

And here’s a demo of the system, which is equally cool:

 

 

 

Although Gestalt IT covers delegate’s expenses for Field Days, delegates are not obligated to write, review, or produce content on any of the products or vendors we see at these events.

vShield App Upgrade Tips for the Paranoid*

 

If you haven’t yet upgraded vShield App to version 5.1.2, here are a few tips not included in the instructions that could save you some pain during the process.

Before you do anything, the obvious first step would be to snapshot your vShield Manager VM.

Right after that, I recommend going in and setting the FailSafe policy to Allow.

 

 

This setting ensures that if the vShield Manager is not available, or has failed, go ahead and allow all traffic.  If you’re in an environment where security is absolutely paramount, and this setting is unacceptable, you will want to ensure you have a maintenance window that would allow for the loss of connectivity in case of problems.

This next step may be unnecessary, but if you weren’t paranoid, you wouldn’t have read this far.  ;-)

I go to every host and force a resync in vShield Manager so that the service VM knows about the setting I just changed.

 

 

Now you’re ready to start the upgrade procedures on page 37 of the vShield Installation and Upgrade Guide.

Once you get your vShield Manager upgraded, go ahead and test an update on a host.  Once it finishes, migrate a VM back to it while pinging, to ensure connectivity is there.  If it’s successful, finish the rest of your hosts.

You can do multiple hosts at once, but sometimes the web client can be unreliable, so I recommend opening multiple browser windows if you’re going to do multiple host updates simultaneously.

Make sure you wait until the first host is already into maintenance mode before starting a subsequent one.  This will ensure you don’t have any conflicts where a VM is trying to migrate to a host going INTO maintenance mode.  Here’s a pic showing what I’m talking about.

 

 

 

*Disclaimer: Author makes no inference that the reader has any actual psychological disorder, nor does the author intend any slight or affront to actual patients being treated for paranoia.  Author is merely inferring that if one has been in IT long enough, in an environment where downtime is measured in dollars, one could be considered to have the characteristics of the aforementioned patients.  Author is not engaged in practicing mental health, dispensing, or prescribing actual mental health conditions.  Virtual Insanity, its principals, and their employers are not responsible for the content of this blog post.  Please drink responsibly.  Qualified buyers only.  I crack myself up.  Use only as directed.  Restrictions may apply.

Keep SRM From Renaming Datastores on Recovery

VMware Site Recovery Manager is one of the best products in the VMware arsenal.  If you’re using SRM, there have been some welcome changes in recent versions.  One is the sheer magic of automatically resignaturing your datastores, and managing the whole process transparently.

The only problem is, when the datasores fail over, they get renamed like a snapshot would.

 

This might not be a problem for you, since VMware takes care of the vmx, and everything else in the background.  But depending on what you use for backups, not having hte same datastore name could have a huge impact on your recovery.

There used to be an XML file you could change to fix this behavior, but in the 5.x versions, they moved the setting into the GUI.  Just to avoid the pain of poking around trying to find it, I thought I’d throw out a blog post.  There don’t seem to be many out there on this.

All you need to do is right click on the site, and go to Advanced Settings.

Put a check in the box that says storageProvider.fixRecoveredDatastoreNames.

Next time you do a failover, you won’t have the snap prefix on your datastores.  If you still have some residual ones with the wrong name, you will need to rename those manually before doing your next failover.

Limiting vCOps Access to vCenter Resources Using Permissions

If you’ve found your way to this blog post, you have likely already read, or even implemented the VMware KB articles on this.  I am not going to link them here, as there are some missing pieces in each of them.  With several hours of trauma under my belt, and a few e-mails back and forth with VMware support, I’ve got the missing pieces.  I’ll start from the beginning, so if you’ve already done some work on this, make sure you can retrace your steps so you don’t get lost.

First thing we need to do is figure out what resources we do not want vCOps to see.  In my example here, I want to limit access to my development environment (the DEV1 cluster).  Those guys are spinning up VM’s so fast, they’re causing vCOps licensing issues.

 

First thing to do is create a collection role for vCOps. It is best to have a user account specifically for this role in Active Directory.  I’ve created one called SVC_VCOPS.  We don’t have to give it any rights in AD.

Going into vCenter, we need to create a role for vCOps collection.

Right click on the Read-only role, and clone it.

You should now have a role called Clone of Read-only.  Rename that to vCOps Collection, or something like that.

Now, let’s go edit that role.

You need to check all of the following privileges.  This is important, and this is where some of the KB’s are missing info.

Once you have those privileges assigned, add the user we created in AD to the vCOps Collection role we just setup.

Go into vCOps Admin via https://vcopsserveraddress/admin and click the Update button next to your vCenter server to change the collection user to the AD user we created.

Once you get that set for your vCenter server, restart your vCOps services.

Now let’s go into vCenter and start applying these permissions.

In Hosts and Clusters, right click on your vCenter and Add Permission.

Add the AD user you created and give them the vCOps Collection role.

Make sure you leave the Propagate box checked.

Now, click on the cluster, or resource you want to exclude and click the permissions tab.  What you’ll see there is the permission you just defined at the vCenter level.  Double click it and change to No access.

Again, ensure the Propagate box is checked.  When you click okay, you’ll get a warning saying the permission is defined higher up, and it’s going to replace the existing one.  Click okay.

The next step is vital, and seems to be a glitch in the vCenter permissions setup.  Remember that Propagate checkbox you made sure was checked on that last step?  It probably didn’t propagate.  Here’s where you save a month of troubleshooting and phone calls.

Go in and check permissions on a VM in the cluster you just excluded.


Don’t panic.  There’s a solid workaround.

Go into VM’s and Templates view.  Add the permission there, and propagate it.

For some reason, some VMware people say not to do this.  It’s the only way I was able to get it to work, short of changing permissions on EVERY VM.  And since they’re spinning up 5 a day, I’m just not doing that.  This works.

First, you’ll want to SSH into the Analytics VM.  Login with “root” and your password.

Next, type the command in blue below.  The prompt shows “secondvm-external“, which indicates you’re on the Analytics VM.

secondvm-external:/ # vi /usr/lib/vmware-vcops/user/conf/controller/controller.properties

Take a look at this file, and find the following line:

deleteNotExisting = false

Change false to true

If you’re in a hurry, you can play with the deletionSchedulePeriod setting, or you can just wait 24 hours, and the objects you wanted deleted will be deleted.

When you’ve made the change, type the following:

:wq

 

One last step for good measure.

Back at the secondvm-external:/ # prompt, type the following:

ssh 172.20.20.1

Note the prompt now changes to firstvm-external:/ #

Type in: su – admin

Now you’re at the admin@firstvm-external:~> prompt.

Type:

vcops-admin restart

24 hours from now, objects that vCOps cannot see will not exist in vCOps.

So now, let’s go into vCOps and get rid of the objects we don’t want to see anymore.

Login to the custom UI via https://vcopsserveraddress/vcops-custom

Navigate to Environment Overview.

Search for, and select the objects that you isolated via permissions, and click on the nearly invisible 8 pixel delete button.

It looks like this, magnified 1000x.  ;-)

Now relax.

If you’re getting the over licensed usage watermark, it’ll go away in 24 hours.

If the objects you just deleted reappear after the next 5 minute collection cycle, you missed a step.

Happy vCOpsing!

 

Set Your MaxHWTransferSize for vSphere Hosts on VMAX

 

I finally got around to trying the new settings for MaxHWTransferSize on my VMAX connected vSphere hosts, and it really is a shocking performance boost when doing Storage vMotions.

Basically we’re telling the VAAI hardware assist to use 4x larger chunks to do this data copy in the background.  Net result is any VAAI copy operations finish up quite a bit faster than they did before.

If you haven’t, I recommend you go read Chad’s article on the topic here:

http://virtualgeek.typepad.com/virtual_geek/2012/12/vmax-and-vsphere-vaai-xcopy-update.html

If you don’t feel like reading the article, and just want to get this going in the lab as fast as possible, SSH into your vSphere hosts and issue the following command:

esxcfg-advcfg -g /DataMover/MaxHWTransferSize

This will tell you what it’s currently set to:

Value of MaxHWTransferSize is 4096

Should be 4096.  If not, make a note of it in case you need to roll back.

Then enter this to change to the new setting:

esxcfg-advcfg -s 16384 /DataMover/MaxHWTransferSize

You’ll see the following:

Value of MaxHWTransferSize is 16384

Now go test some SvMotions.

If this somehow breaks, (not that I’ve seen it) change it back to what it was before.

In my preliminary tests, I’m seeing SvMotions that were taking 1:30 to complete, finishing in 26 seconds.

This is an impressive tweak.  Shouts to Chad for the post, and Cody Hosterman for the boost!

Ohh, and remember, do this only if your hosts are exclusively connecting to VMAX arrays.  It could break on other arrays.