I’ve been invited to Storage Field Day, during its 4th edition in San Josè, California. I met Coho Data at their offices in Sunnyvale. This post is a really detailed review, and as you will see also a really passionate one about what I saw. I know it’s really long, even for my standards. But any try to reduce it it would have been a crime against the high quality of the content I got.
Who is Coho Data
Coho Data (previously known as Convergent.IO) has been founded in 2011 by Andy Warfield and others; for those of you who do not know who he is, he’s among the creators of Xen hypervisor. After attending his presentation, I cannot talk about their solution without talking also about Andy.
It’s a common situation during Tech Field Day when a presentation is made by a CTO or a founder of the company. It’s not so common however to find out this person is also a university professor, and his presentation easily becomes a lesson. That’s what happened with Andy, who literally taught us about scale-out storage, the usage of Flash memory inside storage, and also one of the best explanation about what SDN is for I ever heard (funny indeed this came out from a storage presentation rather than a networking one).
A “studied” storage
Andy supported his presentation with several facts and figures, coming out from the tests made by his team, in order to justify the design choices they took. I’m not saying other solutions have no basis, but while I was listening to those results and how they influenced the design choices gave me the feeling I was in front of a product that only a university environment would have been able to create.
Thanks to the past work on Xen, one of the main goal for CoHo Data guys has been since the beginning to bring the benefit virtualization brought to servers, also to storage. This means to design a storage that could be scalable, and able to “decouple” the management of the underlying hardwre from the data it is holding, basically “virtualizing” it. This has been possible taking advantage of the three main pillars of modern datacenter: commodity hardware, ethernet connections, and Flash memories.
The storage has been designed in order to be used like an utility (think about electricity) and able to arm companies and service providers with the same kind of storage available at giant providers like Google, Facebook o Amazon, without the need to design it internally.
A huge emphasis has been placed on commodity hardware. The advantage comes with the freedom to buy today the version 1.0 of the product (it will be generally available at the end of November), but being able in the future to connect it with newer models as soon as they will be available; thanks to the “glue” completely made by software. Today the hardware model is named DataStream 1000, and is basically a SuperMicro 2U Twin: a 2U chassis with two indipendent servers inside. Each server, named MicroArray, with 2 Intel 910 SSD Flash PCIe cards, two Intel Xeon CPU, 6 * 3 TB spinning disks and 2 * 10G Ethernet connections.
Harmonize the data stream
Andy showed us (using graphs and numbers) how a single Flash card is powerful enough to completely saturate a whole 10G ethernet channel, and it can also stress a powerful cpu. Also, with the existing caching algorythms, an excessive size of the cache compared to the accelerated storage is useless, because above a certain percentage performances do not increase so much, thus the additional expense to buy bigger cards is unjustifiable. Another interesting finding is Flash devices are faster as the CPU has a higher clock speed rather than additional cores.
All these findings led Coho Data to design the single server, the MicroArray, so that each component was balanced with all the others. In fact, we have here 2 Flash PCIe cards, 2 Intel Xeon CPU and 2 10G network cards. If you look at it from a different perspective, CoHo has two streams (as in the name of the product) in each server, like this:
The complete Data Stream (look again, the product name) uses every component so that none of them becomes a bottleneck or causes distress to the others.
If you agree with this idea, it becomes obvious the storage needs to be scale-out: if you change any component of the Stream (like for example a bigger Flash memory) you are compromising the balance; it’s much better to create several parallel Streams, and make them act as one.
Storage and Network, finally friends!
A buzzword fanatic from a marketing division would say that CoHo is probably the first SDS + SDN solution: Software Defined Storage and Software Defined Network. I hate these things, so I’m gonna describe it in a different way.
Back to where we stopped before: the Stream is completely balanced until it exits the ethernet card of each MicroArray. However, a storage is not made by itself, but it’s purpose is to serve data to connecting clients. So, we have to guarantee an optimal data stream up to the clients! From this simple observation comes another disruptive idea of this solution: CoHo also uses a ethernet switch, driven by the storage itself, in order to dinamically manage data streams, towards both the clients and the other MicroArrays.
In this way, the storage is no more “at the mercy” of the network, where it has no control, but instead it can leverage it for its own advantage. Thanks to a OpenFlow 10G switch (right now is a Arista, but it could be something else) Coho Data can publish towards clients (VMware ESXi hypervisor, that are the supported platform at GA) a single NFS connection, that is managed at the same time by all the MicroArrays.
This is made possible by instructing the switch via OpenFlow. NFS connections (at least v3) have no multipathing; a single ESXi server and a NFS storage are connected only by 1 IP-to-IP relationship. In a storage like CoHo, with a high amount of MicroArrays, this would lead to a huge waste of resources and the idea of scale-out itself since a ESXi would connect only to one MicroArray. Here comes OpenFlow: even if the exposed NFS has only 1 IP address, Coho Data configures dinamically the switch in order to distribute al the connections coming from ESXi servers to several MicroArrays, using parameters like loads, bandwidth congestion, and so on. ESXi servers see only 1 IP address, but behind the switch there are many MicroArrays. That’s a really cool use case of SDN.
A storage towards the future
All the MicroArray talk to each other, they cohordinate themselves, exchanging data blocks and clients connections, so they appear from the outside as a single storage. But how does it works?
This is under the hood an object storage, and regardless its size each data is chopped into pieces with variable size, and those pieces are the objects themselves. There is also a metadata for each object identifying its position inside the storage and other informations, and it’s saved into two MicroArray for redundancy. There is not even a file system over the disks or the Flash memory, Coho Data directly writes to the physical media, and then saves the position into metadata. This means the storage is pretty fast, since there is no overhead created by the file system. Also, there is a common namespace for all the storage nodes, and all MicroArrays see the same data even if the copies of each objects are written only in some of them.
The system can survive several failovers, and it continuously rebalance data among arrays to have them always protected and available. There is no RAID at all, and redundancy is achieved spreading object copies on different MicroArrays. For example, there is a HeartBeat between two arrays of the same chassis, so they know they are “neighbours” and they do not save two copies of the same object on them (the only exception is when you start with only two MicroArrays). In this way any problem to the shared power supplies does not lead to a data loss.
Two copies of any object is not meant for redundancy only, but also for performances. Whenever you connect additional MicroArrays, objects are re-distributed on the new arrays; the final result is a storage with an overall huge computing power, and its performances increase as you add nodes.
And even with this awesome design, there is something even more interesting, like this:
NFS has been chosen as a first supported protocol only because is a common language to let this visionary storage talk with a “legacy” platform like VMware NFS. I know it sounds weird to use the term “legacy” in regards to VMware, but it is, because it still uses NFS v3, that is a 12 years old protocol. If possible, CoHo would be able to use less abstraction layers, and talk with clients with a lower protocol, maybe directly available inside the application (look at the examples like Mongo and Hadoop in that chart), or even have application able to talk the same language of the storage (“Direct Integration”). In this situation, the object storage would not have problems (like for example Ceph) by having a gateway to translate client protocols into the object store language, where this gateway is a single and separated system, and so a possible bottleneck. Here every MicroArray acts also as a gateway, and thanks to SDN they are all active at the same time; as a final result gateways can scale-out together with MicroArrays.
The decoupling of object store from the communication protocols leaded to these opportunities. And we are only at the beginning of this project…
Storage Analytics!
With a team of researchers designing such a storage, you would imagine all their resources were used to create the architecture, and the management would be possible with some complicated command line, as in some open source projects. No! CoHo Data has a complete HTML5 interface, simple to use, that exposes to users on purpose only the informations they need to take care about (a broken disk, a down link…) leaving automated tasks directly to the storage. The storage directly sees the virtual machines, and you can apply TAGs to them and group them by tags. So, you can summarize space and I/O consumption by groups and have a real “showback” system, and use it as a reporting tool.
I would like to focus also on a feature that will not be available in version 1.0, that Andy previewed us: a complete data analytics system!
An I/O trace uses only 4 bytes, and it can be further compressed to 2. Since the Flash memory is so big, and there is basically no overhead created by the I/O tracing, Coho Data registers every I/O trace above the warning threshold (like a surveillance camera that start to record only when it detect a movement), and when the size of the I/O traces is as big as a NAND cell, it writes all the traces in it. Then, when the storage has some spare CPU cycles, it automatically starts to analyze those data: first it analyses the I/O history and identifies problems on single VMs. But then, it also does “what if” simulations, showing what would happen if one or two additional arrays would be added.
This is a proper modeling system, that can help a user to evaluate the benefits of an additional array BEFORE buying it. It’s the first time I see such a thing in a storage, and it’s damn cool.
Final notes
I should admit it, I literally felt in love with this solution. The elegance of the architectural design and the study behind any choice are admirable.
For sure Coho Data is in its childhood, the product will be available only at the end of November 2013 and it will be a 1.0 version, that means with limited enterprise features (no replica between clusters or deduplication for example), but they will have already VMware VAAI-NAS libraries. They have already many new features in their roadmap, like replica and deduplication; they are also going to have a prioritization system, or a QoS if you prefer, and even an ability to react to recurring workloads by pre-loading hot data in the Flash memory, thus forestalling the clients needs, thanks to the storage analytics.
I really hope Coho Data will succeed, it would be a total pity if those beautiful ideas will get wasted. Good luck!