In VMware environments image-based backups are the most common nowadays; they save directly VMDK blocks instead of having agents installed inside the Guest OS to save single files. Using CBT technology, first introduced with vSphere 4.0, we can now identify only those blocks that changed since the previous backup, and save only those. This allows incremental backups using low disk space and short backup times.
CBT technology and its interaction with Guest OS filesystem however has some issues regarding the optimization for backup purposes.
Let’s see an example: we have a Windows 2008 R2 VM acting as a file server. Its VMDK disk has been formatted with NTFS filesystem using default block size at 4KB, and hosts thousands of files. A user modifies a small 12kb file and the changes need to be written to disk. Now, that file is made of 3 NTFS cluster, and because of fragmentation, those 3 blocks could be spreaded in 3 different VMFS blocks:
In this case a simple change of 12KB in a file can result in a 3MB change on VMFS virtual disk. A classic backup software with a dedicated agent would have detected the single file change, while a image-based backup would save all the modified 3 MB data.
These issues are “embedded” in the CBT way of working (but it also offers on the other hand a great set of pros…), but it is possible in some ways to optimize this behaviour to reduce the modification of CBT blocks.
You can basically do two things:
– defrag of the guest partitions. This activity sorts out all the clusters of the guest filesystem optimizing the fill of CBT blocks, resulting in the usage of the smaller possible amount of VMFS block. Since many blocks will be modified by defrag and they will be marked as modified by CBT, is a good task to do only before a full backup; doing it before an incremental will result in a backup size near to that of a full one.
– sdelete: this tool deletes non-used clusters filling them with 0. Besides being a great way of secure file deletion, is good also to clean VMFS blocks and optimize disk space. It’s anyway NOT to be done on Thin Provisioned disk, since sdelete would fill all the assigned space inflating the disk to become al large as the assigned space, like a think disk.
Scheduling a script that can do those two tasks before a full backup can lead to great improvements in backup disk usage and speed.