In my previous article Windows 2016 and Storage Spaces as a Veeam backup repository I talked about the advantages that Veeam Backup & Replication can bring when combined with Windows Server 2016 and the new ReFS 3.1 filesytem. Several people have asked already about some practical examples about how to design a solution using these technologies, so I thought it was time to give you one storage design.
One warning: as in many situations, this is just ONE of the possible design solutions that can be created. It’s not meant to be the official design that Veeam is suggesting to everyone, as it has many advantages but also some limits. For example, ReFS as of today has no support for deduplication, so if you are storing many virtual machine backups and you still want to leverage global deduplication, you may want to use a different solution, like a deduplication appliance or leverage NTFS and Windows Data Deduplication. Also, I’ve used as in the past ThinkMate as the source for my simulations. I have NO relationship with them, nor I get any fee for suggesting their products. It’s just that they have a nice website where you can configure servers with multiple combinations. You can ask for the same design to one of your preferred vendor, as this type of server is available from any major server vendor (HP, Dell, Cisco to name the most known).
The machine
Modern servers have really high limits in terms of CPU and Memory they can hold, so there’s is no such problem anymore to split workloads over multiple servers, but still network can be a limit, and even more people have to evaluate carefully the failure domain. As long as the storage is not part of Storage Spaces Direct, so data can be replicated between multiple nodes, a single server is still a single point of failure for the data it holds. It’s not just the chance to lose data, but more frequently any downtime that a single server can have. For example, if Windows Server 2016 needs to be rebooted for maintenance, all the backups hosted on this machine will be temporarely unavailable. On the other side, each server has a starting price, that depends on the sum of chassis, cpu, memory, while disks are a variable components. The more disks we can put into a single system, the less number of chassis, cpu and memory we have to buy.
I think that servers hosting between 30 and 50 disks are a good balance, so as a starting point I chose this server:
http://www.thinkmate.com/system/stx-cl-xe36-2460
It can hold two CPUs, 8 memory slots, 2 disks for the operating system and 36 for data. These are the components that I chose, and the reasons for the choice:
CPU: 2 * 8–core Intel E5–2640 v2. Windows 2016 licensing is per core, and initial licensing covers up to 16 cores. With two of these processors, we can have 16 concurrent incoming streams, and still be able to limit license expenses
Memory: 64 Gb. Following Veeam best practices about repository sizing, we usually recommend 4GB for each Core. Do your math, 16 cores * 4GB is 64GB.
Boot disk: a hardware Raid-1 using two Micron M1100 256GB SSD. Plenty of space and speed to host Windows 2016. Also remember, ReFS cannot be used for boot volume, so this volume will be formatted with NTFS.
Network: a dual 10Gb ethernet card is needed, as this machine can surely ingest more than 125 MBps, which is the theoretical limit of a 1 GB connection. Also, remember that the target might have data sent to it from multiple sources, so it needs to be able to handle this.
Storage
Obviously, the most interesting part of the design is the storage, and it’s also the place where you can see the advantages of using Microsoft Storage Spaces.
First of all, the storage controller. Even if the disks are going to be exposed one by one to the operating system without any raid or cache, it doesn’t mean that the underlying controller is not important anymore. On the contrary, with more than 30 disks to control, and some of them being high speed SSDs, the controller is still very important. This server is equipped with an integrated LSI 2308 SAS 2.0 6Gb/s Host Bus Adapter. This is a good controller, but if the server you are choosing has different options, please verify that it can handle a good amount of IO, and that it can expose the connected disks to the upper operating system without any raid configuration. Some of them in fact requires complex configurations like one Raid-0 volume for each disk, which is something really stupid to do and most of all maintain. It’s a good idea to check if your controller is certified for Windows Server 2016 at https://www.windowsservercatalog.com .
Talking about disks, we first need to talk about another great feature of Storage Spaces: Storage Tiering. By introducing two layers of storage in a server, like SSDs and HDDs, Storage Spaces can create so called Storage Tiers. With these two layers combined in a so called “hybrid” volume, data is always ingested first by the SDD tier and then Storage Spaces transparently moves blocks to the slower HDD tier. As a result, storage tiers can dramatically increase performance without sacrificing the ability to store large quantities of data on inexpensive HDDs.
Thanks to this design, incoming Veeam backups will be received by the high-speed tier, so ingestion can be massively improved. If you want to learn more, Carsten Rachfahl and Didier Van Hoye have published the video where they have tested automatic tiering, go watch it here.
NOTE: I’ve been informed that Storage Tiering is NOT supported by Microsoft in Storage Spaces over a single server, that is without Storage Spaces Direct. So, think about this hybrid storage configuration as a possible future design IF Microsoft will support this configuration.
Back to the real design. I went for 6 * 1.0TB Intel SSD DC S3100 Series 2.5″ SATA 6.0Gb/s Solid State Drive, and I will create a pool using mirror mode. This means 3 TB of SDD tier. Ideally, you would design this tier to be big enough to receive any incremental backup that is created by Veeam, so that they are all written to the fast tier, and then tiered to the capacity layer.
For the capacity layer, I choose 30 * 6.0TB SATA 6.0Gb/s 7200RPM – 3.5″ – Western Digital Se (code WD6001F9YZ). for the capacity tier I go for Dual Parity. As I explained in my previous article, this is comparable to Raid-6. Since 6TB disks take more than an entire day to be rebuilt, I’d never store data on a single parity volume. Dual parity could have some write penalty in theory, but the overall solution is not going to suffer from the random IO, first because incrementals are going to be ingested by the SSD tier (that is configured in mirror mode), but also because thanks to ReFS and blockcloning, the transform operations of Veeam backups are going to be mere metadata updates, without any real IO happening on the storage.
Also don’t make the mistake of creating a large capacity tier out of only a few large 7.2 K HHDs. To have some acceptable IOPS in the capacity tier you have to strike a balance between capacity and a sufficient number of disks. As an example, it might be better to choose 30 * 4TB HHDs instead of 15 8TB HDDs. This will help during real time tiering.
Storage Spaces works with “columns”, which can be compared to the stripe size of some storage able to do wide-striping. The maximum number of columns can be 17, which means that any block written in the capacity tier will be spread across the disks with a 15+2 schema: 15 parts for data plus 2 parts for parity. This value cannot be increased even if we have more than 17 disks. What will happen is that each block will be spread over 17 disks, and each time the disks used for a write will be different, and ultimately all of the 30 disks will be used. The efficiency is 88%, which means that out of 30 * 6GB disks, we have 180 TB of raw space but 158,40 TB of usable space.
The final result is that I have 3 TB of SSD tier and 158 TB of HDD tier. This is 161 TB of usable disk space, and the final price of such a server is 22762 USD. Add a Windows 2016 Standard license at 699 USD, and the final price is 23461 USD. Which means 4,04 USD per TB per month over a 3 years time frame, or 0,004 USD per GB. Add cooling and power, but still this is a really low price. And remember that ReFS blockcloning will give you even more space savings.
When people start discussing about public cloud storage and how much it is cheap, show them these numbers. And if you are a provider and you think you cannot compete with the hyperscale clouds, well, think again: this type of design can give you a really fast and really cheap solution to store your Veeam backup files.