A couple months ago after the introduction of the new EMC VNX arrays, I posted my thoughts on it here. One of the engineering choices I questioned was the use of SSD’s for extending cache versus a PCI card. It was always obvious why it would be better when cache was being added or replaced, but I questioned the throughput potential of a SAS interface versus a PCI one.
I got some interesting feedback on that from several people, and I appreciate it. It wasn’t until the other day that I realized that the argument really did not have that much merit. In a moment of blinding brilliance, I realized that the only time this might make a difference is when warming the cache.
How did I come to this realization? I was in a VNX deep dive session presented by Chad Sakac, and I had every intention of asking him the question of PCI versus SAS when it comes to cache. Lucky for me, he brings it up during the session, before I could ask. Before he was done with the rest of the presentation, I realized an error in my prior way of thinking.
Chad pointed out that the time it takes for an IO to go through the controller, loops, and hit the flash is measured in nanoseconds (10-9). Once it’s there, the flash has latencies in the microseconds (10-6). So there is not likely to be a significant difference in latency between SSD, and PCI when it comes to cache.
PCI obviously has greater throughput potential, which is why I previously asked the question. But a realization jumped up and bit me while I was sitting through this presentation. Cache IO’s are usually small chunks of data that benefit from the reduced latency of flash / DRAM. They aren’t giant read / write operations that generally require extremely wide bandwidth. Will the increased bandwidth of PCI make a difference? I have my doubts that it will be noticeable on the vast majority of workloads. But this is just my opinion, as an outsider without the benefit of a storage engineering background.
I am looking forward to seeing the SPC1 benchmarks from the VNX. I believe it will objectively tell the whole truth. A slight difference on an anomalous workload is not significant enough to outweigh the benefits of SSD versus a PCI cache. It’s easily swapped, and it’s non-volatile. It only needs to be warmed once. If a controller fails, the cache doesn’t die with it. Replace a controller, and no need to rewarm cache.
Like I was alluding to in my last post on this. . .every design decision, whether it is in storage engineering, vSphere design, automobile design, is one of compromise. The SPC1 will tell the whole story, but I think what we’ll see here is that this particular compromise was overall a good one. What do you think? Let me know in the comments.