One of the most common questions I get in regards to Veeam and the usage of XFS filesystem with its reflink capabilities is “how do I measure the real used space?”. There are some techniques and some blog posts around, but I noticed that none of them were written with operations in mind. Time to fix this.
The situation in my lab
In my home lab, I have a Linux repository where I store some backups created with Veeam Backup & replication. It uses XFS as the file system, with reflinks enabled. This leads to using fast clone when Veeam saves the backups into this volume:
However, when we look into the filesystem, we see the files as they are written and registered, with no information about the fast clone activity:
root@lnxrepo:/backup/backups/DummyLinux Local Backup# ll -h total 7.2G drwxr-xr-x 2 veeam veeam 4.0K May 27 20:08 ./ drwxr-xr-x 7 veeam veeam 147 May 26 10:20 ../ -rw-r--r-- 1 veeam veeam 1.5G Mar 10 2022 'DummyLinux Local BackupD2022-03-10T121037_50DF.vbk' -rw-r--r-- 1 veeam veeam 1.6M Mar 10 2022 'DummyLinux Local BackupD2022-03-10T220033_145A.vib' -rw-r--r-- 1 veeam veeam 1.6M May 23 08:40 'DummyLinux Local BackupD2023-05-23T103152_4E75.vib' -rw-r--r-- 1 veeam veeam 663M May 23 20:08 'DummyLinux Local BackupD2023-05-23T220020_C2FE.vib' -rw-r--r-- 1 veeam veeam 1.5G May 24 20:08 'DummyLinux Local BackupD2023-05-24T220014_EC29.vib' -rw-r--r-- 1 veeam veeam 123M May 25 20:08 'DummyLinux Local BackupD2023-05-25T220030_C6D2.vib' -rw-r--r-- 1 veeam veeam 148M May 26 20:08 'DummyLinux Local BackupD2023-05-26T220025_35F9.vib' -rw-r--r-- 1 veeam veeam 3.4G May 27 20:08 'DummyLinux Local BackupD2023-05-27T220131_CD1E.vbk' -rw-r--r-- 1 veeam veeam 63K May 27 20:08 'DummyLinux Local Backup.vbm' -rw-r--r-- 1 root root 997 May 27 20:08 .veeam.11.lock
We see in fact the two VBK files listed as they are, even if we know the second one in reality is not using all the 3.4 GB.
This leads to bigger numbers than what it’s stored in reality in the volume. As I work with service providers mainly, they need to calculate the real consumption in order to give accurate reports to their customers.
The commands in bash
Thanks to the huge capabilities of linux bash shell, we can create a quick script to gather the numbes we need.
First, the files.
My Linux repository uses a dedicated mount point, /backup.
to get the sum of all the files in the volume, in bash I can run du -c against the folder path where Veeam saves the backups:
A note on the command: I’ve seen some posts around using the ls command to obtain the same information. I prefer to use du because it shows the space actually used on disk, as the file system sees it. ls reads the allocated space, which may be different in some situations, like sparse files.
The number we want is the last one, the total. But note that the number we need is surrounded by man information that are interesting when manually checking these values, but they cannot be really used in a script. Give me a minute and I’ll show you how to parse this output.
The second number we need is the used space of the volume. My Veeam repo uses /backup, that is the mount point of the volume /dev/sdb.
Obviously, to make things work, that volume has to be used only by Veeam; any other file stored in the same volume but not created by Veeam will impact the calculations. The number we need can be obtained with df /dev/sdb:
The value we need is the one under the Used column, but again there is too much additional text here. Time for the script.
#!/bin/sh veeam_repo=$(pwd) veeam_volume=$(mount | grep $veeam_repo | cut -d' ' -f 1) sum_files=$(du -sc $veeam_repo | awk 'BEGIN { getline } { print $1 }') used_space=$(df $veeam_volume | awk 'BEGIN { getline } { print $3 }') ratio=$(echo "scale=2; $used_space / $sum_files * 100" | bc) echo "XFS reflink ratio is $ratio %"
Let’s explain it.
At the beginning, we use two variables to declare the path of the Veeam repository and the volume where it is mounted. But we don’t need to define them at all, we just place the script in the base folder of the Veeam repository, like /backup in my case. The second line will find the mount point of the Veeam repository among the mounted volumes, and use its value:
This means obviously that the Veeam repository has to be in a dedicated volume.
Then, we do the same two measurements I explained before, but we grab only the numbers we need. To do so we use awk, one of the most powerful commands in the entire Linux system.
Once we have those two numbers, we do a simple division; but since bash can only divide integers, we use bc, another bash command. Scale=2 is to tell bc that we want to round the result to the second decimal value.
Finally, with echo we write the value in the output. And since it’s multiplied by 100 that’s a percentage, not the raw ratio between the two values.
root@lnxrepo:/backup# ./reflink.sh XFS reflink ratio is 54.00 %
The real usage of the volume is 54% smaller than what we observe in the file system.