Since Veeam Backup & Replication 9.5 became available and people are using the ReFS blockclone API, one of the most common questions has always been “how much space I’m saving? Is there any way to measure it”. Finally, there’s a way to answer to this question!
NOTE: I’m just an ambassador here. I decided to use my blog and the amount of readers I have to promote this tool as much as possible. But all the credit goes to my colleague Timothy Dewin. You probably know him already for his great Restore Point Simulator, that almost everyone is using for their sizing exercises for Veeam deployments. Well, now you have another reason to say him thanks!
ReFS, blockclone API and space savings
I wrote before a couple of articles about ReFS 3.1 and its BlockClone API: in Windows 2016 and Storage Spaces as a Veeam backup repository I explained how the blockclone api works. During an incremental chain, there is no space saving, since every block in each incremental backup is a new one. But during any synthetic operation, this is were the space savings come into play. Let’s suppose that I’m using a regular configuration for my primary backup: daily backups, incrementals during the week, and synthetic fulls created once per week. After a week, the synthetic operation will be executed:
Veeam will read all the blocks of all the incremental, will sum them all to the previous full file .VBK, and build the new full. That’s why it’s called synthetic, as the full is synthetically created without having to run a full backup against the production environment. Without BlockClone API, each block would be effectively read and written again in the storage, consuming a lot of I/O but also the space needed by a completely new file:
With ReFS and BlockClone however, only the pointers of the blocks are updated, but the involved blocks will never be written twice:
The created VBK will be assembled by using new pointers to existing blocks. At any given point in time, only one copy of each block exists on the filesystem: a block can have many pointers, but it only consumes additional metadata to store the pointers themselves, but the huge space consumed by the actual blocks is preserved. Also, as long as a block has at least one pointer, it’s not deleted from the filesystem. This technology results in potential significant space savings.
How much space I’m exactly saving?
This is the question that we heard many times since, and now we can finally answer to it with precise numbers. To do so, let me introduce Timo’s latest creation: blockstat.
Blockstat is a little command line tool that can analyze the content of a folder (but it has additional options, more on this later), read the ReFS metadata, and tell you how much space has been saved by having blocks belonging to more than one file. Amazing, isn’t it? Let’s see it in action.
In my lab, I have a backup job configured on purpose to generate a lot of synthetic fulls: it runs every 2 hours to protect , and created a new synthetic full daily:
This backup is consuming a Veeam Scale-Out Backup Repository with four different extents, and it’s also configured to use per-vm backup chains. Policy is set to Data Locality, otherwise we would lose any benefit of ReFS blockcloning. But this configuration also means that my 8 VMs backup chains are splitted across multiple extents. So, let’s focus on one of these for simplicity:
This folder holds 3 VM backup chains, and after one day of execution I already have two files for each chain. If I try to run the tool now, I get this:
Actuall, I use the powershell script to run the tool instead of the tool itself; this allows for counting entire folders instead of comparing manually two files. I just need to edit the first two lines to configure the folder with the backup files, and the location of the tool, like this:
$dir = “Q:\ReFS Locality Extent 3\Backup_ReFS_Synthetic”
$exe = “C:\refsc\blockstat.exe”
So, what the tool is telling me? There’s a total of 41.6 GB of files in that folder, and I can confirm this number by summing the files via Windows Explorer:
Then, the tool gives me additional information:
1x 44707086336 means that there are 41.6 GB of blocks that only have one pointer each.
Total Saving 0: combined with the previous value, tells me that no space saving has happened yet. Each block is only used once.
Total fragments 818: be ready for this value to grow overtime. ReFS will be a fragmented filesystem, and it couldn’t be different since it’s a CoW (Copy on Write) filesystem.
As I explained before, an incremental chain like this one doesn’t give any space saving. I need to wait for some synthetic full backups to be created. So, I waited for some days, and I checked again my folder:
I have now 60 files (+1 vbm file) for 3 different bcakup chains, 20 files for each VM. And as you can see from the screenshot, each chain has multiple VBK files. The total amount of space “used” by the files is 170GB, but this doesn’t take into account the savings made possible blockcloning. Usually, people would check the property of the entire volume (Q: in my case) and see the real disk usage:
So, indeed I’m having space savings, as the sum of all my files are 170GB, but they only consume 67GB on disk. But can I have more details about these savings? Let’s run blockstat again.
Here, I immediately see my savings and how they are distributed: 1x of 7.1GB means that these blocks are only referenced once. This is like before, but you already see in the picture the other information. 1.5 GB are used twice, so it means a saving of 1.5 GB, but also 1 GB of blocks is used three times, which gives other 2 GB of saved space, and even better, 39.31 GB are used four times, which means 118GB are saved. The total savings are 121.4GB. Also, note that there are many many more fragments than before.
I’ve run the tool against the entire folder, but if I have more chains in the same folder like in my case, I may be interested in learning how much space a single VM backup is consuming. I can do this by using this command line:
I gave the tool the list of the 4 VBK files that the VM chain has, and in return I got these numbers, which show me ultimately that I’m saving 46.5 GB of space for this specific VM.
Final notes
As you can quickly imagine, I can use the tool to build even complex calculations to measure the actual savings for each virtual machine, for showback or chargeback purposes. I know many service providers for example that are willing to use ReFS repositories for Veeam Cloud Connect, and would like to showback to their customers the savings they are obtaining. Before, they had no solution to make these calculations, but now thanks to Timo and his tool, they have a starting point to work on some reporting script.
If you want to grab the software, check the source code, or even contribute to improve it, Timo has a dedicated page on his website, just go here: http://dewin.me/refs/. Thanks Timo!