I’m reading lately a huge amount of informations regarding Flash storage. Articles, blogs, whitepapers, written by analysts, bloggers and companies that produce and sell any solution based on NAND memories. Each of them have their own solution: all have been designed to cure the long-standing problems involving the slowness of storage based on spinning disks; but at the same time the way they solve the same problem is completely different one from the other.
So, to a guy asking “I think I may need a flash storage, but which one?” what can I say?
What do you have to replace in your datacenter?
For sure, every infrastucture architect loves when he can design a new infrastructure from scratch (so called “greenfield”) without any relation with existing components. Adding something to an existing infrastructure is a challenge, but also a risk; and it’s even more dangerous when you have to touch something critical like storage.
However, the most common situation is a “brownfield”. There are many considerations you need to take when you are going to introduce a new technology, and in an already running infrastructure the invasiveness of a new solution is for sure an important characteristic. Personally, when I’m helping a customer to choose a good solution for him, one of the questions I put to him is this one:
“what do you have to replace in your datacenter?”
In fact, when you are in the middle of a hardware upgrade or substitution, after their life cycle, it’s for sure way easier to introduce a new technology. Be careful! I’m evaluating these solutions exclusively based on their ease of insertion in an existing environment. There are obviously many other aspects to be evaluated for the final choice; I’m simply not listing them in this article.
I made a simple table to describe some scenarios. The arrow describes how many components you have to replace. This is simply a short summary of the main use cases, for sureyou can use the same solutions even in other different scenarios.
Server Side Caching
This is for sure the least invasive solution, and the one offering a sensible performance gain of an existing storage by using a small budget and quickly. At a real minimum, you only need one SSD per ESXi server and the licences of the chosen software (Pernix Data, FlashSoft, Proximal Data, or even VMware vFRC).
This is the right choice if you are not willing to renew any other component of your infrastructure, and if the point problem you need to solve is the performance level of your existing storage. It’s a really targeted solution: if your storage is not suffering from lack of free space, adding disks only to improve performances is a really inefficient solution. On the other side, thanks to Server Side Caching you can let the centralized storage manage capacity, and demand performances to the caching solution.
Each of the available solutions has peculiarities and differences: it can accelerate only reads or also writes, support for VMFS or NFS filesystems, or both, support specifically for VMware vSphere or also other Hypervisors… What’s common to all of them is the low invasiveness: none of these modern solutions requires any agent or driver inside virtual machines, instead they all operate at the hypervisor level. This makes deployments really easy, and does not requires any modofication to virtual machines.
If you only need to increase performances of an existing infrastructure, this is the right solution for you.
Flash Storage Array
There are on the market several storage solutions designed from the ground up to use NAND memories. They are mainly divided into two groups: Hybrid (with a mix of flash memories and mechanical disks) or the so called AFA (All Flash Array, completely made with NAND memories).
The choice between these two types depends on several elements like price per GB or price per IO for example; hybrid systems are cheaper, while the AFA are faster. It’s outside the scope of this article to evaluate pros and cons of the two kind of storage.
Rather, it’s more interesting to evaluate how these solutions can be integrated in an existing environment. Even if they all have their own internal design choices and they all seem different one from the other, one of the advantages is they all appear to clients like another “classic” storage array: they can all be connected to ESXi servers using FC, iSCSI, FCoE, or NFS. Also, many of them support VAAI accelerated libraries, and some (not all) have enteprise features like Snapshots and Replicas.
The higher invasiveness compared to server-side caching is not about their introduction into existing environments, but because at some point you have to migrate virtual machines onto them in order to use these new storage arrays. Since VMware has enabled Storage vMotion capabilities into lower licenses, today this feature is only missing from Essentials licenses. Nonetheless, the migration of virtual machines is not trivial: you are going to produce a high I/O load onto the existing storage, and since the reason you are replacing it is its slowness, this could be a problem. For sure you will have to plan the migration really carefully.
Converged Infrastructure
With this term, people is describing many different solutions, and I’m seeing lately even many primary vendors are using it.
Honestly, I use it only if I’m talking about two companies, Nutanix and SimpliVity. All other solutions like Vblock, FlexPod and the like, to me sounds more like a bundle of existing products carefully assembled from a vendor, and sold as a single product together with some management layer. On the contrary, Nutanix and SimpliVity has been designed from the ground up to be an integrated system with both an hypervisor and a distributed storage inside any node of the cluster.
A single server has a certain amount of storage, and the cluster can be extended by simply adding how many servers you like. In this way, you increase both computing power and storage (this one both in capacity and performances).
These solutions are incredibly effective when you are building new infrastructures: they are really easy and fast to deploy, but most of all they can scale over time, so you can avoid excessive startup costs. This is their best advantage from a financial standpoint.
But if we are talking about their integration into existing environments, these solutions are in a way “self contained”. They are perfect if they do not have to interact with other components, but for example they are not designed to be used as an external storage for other ESXi servers. This simple fact limits their usage into brownfield scenarios; they have for sure specific use cases where their use is a sure success, but the idea they can be “THE” solution for every situation is totally wrong. Instead, if we think about new projects inside existing environments as they are “micro new environments”, then they are a good fit even in brownfield scenarios.
Flash in my old array? No thanks.
For sure you already realised I skipped one of the most common option you see around lately: the addition of Flash disks into existing storage arrays. The reason is, if the storage was not designed to use Flash memories, the gain coming from these disks is almost zero (to be fair, is too low for the price of the disks). Yes you are going to have a performance gain, but not as high as expected, and you will end up have just other disks, only faster.
The speed of Flash memories has two essential problems that can be managed only by using a storage designed for them: the I/O of only one memory is high enough to completely saturate a communication bus, regardless it’s SATA or SAS, and a Flash memory puts a high load onto a controller CPU, way more than disks. If an array controller has not been sized to manage these increased performances, and it’s software was not designed for them, the addition of flash is probably going to create more problems than benefits.
What will happen in the long run?
If we try to think about these solution for a longer time frame, maybe some ideas would be different. For example, server-side caching is a quick solution for today’s problems, or it could become a new way to design storage? Are we all headed towards a converged solution in the long run, especially since VMware is going to heavily promote these same concepts and make customers accept them thanks to VSAN?
The lifecycle of technology solutions has been delated lately, and it’s common to see customers renewing their solution once every 4-5 years, and no more every 2-3 has it was happening only few years ago. This means, when you are introducing a Flash solution, whatever it is, you need to think you are probably going to keep it for several years.
Anyway, you really need to add Flash solutions into your design choices today, if you are not already doing it, and stop thinking about buying capacity (in the form of mechanical disks) to solve performance problems. Disks are not going to help you, NAND memories will.