Back in June, I attended a great presentation by Aidan Finn during E2EVC and I was finally able to learn more about Microsoft SOFS (Scale-Out File Server), and I reported my findinds in the post An introduction to Microsoft Scale Out File Server. One of the outcome of the post was that the name Scale Out in the Microsoft solution was not correct, as in reality there were shared SAS jbods between the storage nodes. As I explained in the post, the main concept of a true scale-out architecture is Share-Nothing: there should be no shared component between nodes, and the failure of a component should not impact the overall storage solution. Well, I’m sure Microsoft didn’t used my post as a roadmap suggestion, but during the just ended TechED Europe event in Barcelona, a new and really interesting solution has been announced, and the name is promising: Storage Spaces Shared Nothing.
A quick look
Since there are really few informations by now around on this new solution, I extracted most of the informations in this post from this video. I really encourage you to watch the entire presentation because is really interesting, for example there is a quick introduction also about Storage QoS that will be available on Windows Next. But if you just want to listen to the topic of this post, go to 1h2min. Some of the designs you will see in this post are grabbed from this presentation.
The first thing I’ve learned from the presentation is about CSVFS, an acronym for Clustered Shared Volume File System; better known as Cluster-Wide File System. This component is the foundation of the already existing SOFS solution, and it was a good opportunity to learn more about it since it has a huge role also in the upcoming Shared NOthing solution.
What it does? If you think about the history of NTFS, and Microsoft in general, we have always lived and died by driver letters. A: for the floppy disk (those of you old enough to know what they are), C: for the operating system partition, and so on. The drive letter has always been the boundary of NTFS file system, and the only way to spread a single volume over multiple disks, was to use RAID or other storage technology underneath the file system, and present to it a single large disk volume.
With CSVFS, finally the boundaries of the drive letters are removed! No more “out drive letters” when trying to use many disks without a raid aggregation, and other pesky limits.
With it, all nodes in the cluster can access the same volume/file system at the same time, even if write I/O is limited to one node until there’s a failover. It sounds a little bit like VMware VMFS, even if at least from what I understood, there is no inter-node locking, so all writes are done by only one node that is something like a “master” in the cluster (I used this term to identify it, not the presenter). Not sure if this has some performances impact compared to a solution where all nodes can also write at the same time, but at least read I/O are distributed among the nodes.
One important note here, and another “foundation” for the shared storage, is the “single consistent namespace”, a single namespace that spans over all the nodes forming the cluster.
Share Nothing
Time for the main topic of this post. In a nutshell, this image I created explains the improvements when compared to SOFS:
The huge improvement is, as you can see, that there is no more a requirement for the JBOD disk shelf connected to all the nodes of the cluster. Each node has its own direct attached disks (both SSD or HDD), and thanks to the new software they are pooled together to form the final storage. The cluster can so expanded behoynd the previous limits of 8 nodes, as there is no more a limit regarding the number of SAS connections available in a given JBOS chassis. The cluster can start small, and be expanded or even shrinked dinamically, with the new software that will re-distribute and re-balance the copies of the blocks among the available nodes. Shared Nothing means that now the software storage is tolerant to entire node failures, and it can be configured with replication; any virtual disk is made up with several “slab”. Each slab is a 1 GB chunk, and the replication is done per slab. So the granularity is already quite good and not done at the entire file level, even if for example other scale-out solutions are using way smaller chunks. When a failure or a rebalancing occurs, the cluster has to replicate at lest 1 GB of data, but honestly if the primary goal is to host virtual disks of virtual machines, there are few chances that a disk will ever be smaller than 1 GB. Different slabs of the same disk can be spread in different nodes, so to increase the protection of teh copies from multiple failures, and also increase the rebalancing performances. The presenter talked about a 3-copies configuration, so each slab has two additional copies, and the cluster can survive the loss of two nodes. Probably the replication factor can also be changed if desired.
The replication is done over ethernet, using SMB3. This protocol has gained some enhancements like multi-channel connections that can improve a lot performances. SMB3 is also used for storage replica (an async replica solution for DR purposes, also described in the same presentation). Microsoft I think is doing a great job in forcing SMB3 to become the common protocol for all Windows communications between machines.
I don’t have any additional informations for now, for example about the “prescriptive configurations”, that sounds like there will be an HCL (Hardware compatibility list) or some pre-defined configured server models. During the presentation it was said Microsoft is working with several vendors and testing different configurations, and the suggested configurations will be available when Next will be released. This makes total sense, and it a great way for Microsoft to avoid problems when customers or partners are going to use wrong or sub-optimal hardware and then start to complain against the software.
Final notes
The software has just been announced, and it’s in its really early stages. For example, thin provisioning will not be available, and there is not yet a precise idea of what the maximum number of nodes will be (it was said somewhere between 10 and 16). Many of the features explained are not new at all, especially if you like me are already following the advancements in the scale-out storage solutions. They are not “disruptive”, they simply makes sense in a scale-out design (read more about The future of storage is Scale Out).
But the biggest news here is not about the technical details, instead is the news itself that Microsoft is about to launch this solution! First with SOFS, now with Storage Spaces Shared Nothing, Microsoft is becoming a real and serious player in the storage market. Welcome to the scale out world Microsoft, this time for real!