Lately, many design projects I had had data protection as their main topic: VMware backups, remote backups, long-time retention, replicas, Disaster Recovery and so on.
While I was reading customer’s requirements for his “data protection”, the most frequent has been always two:
- higher backup speed than his current solution, often based on “classic” backup software simply upgraded and extended with some support for virtualized environments
- the will to have a single pass backup, able to save both the whole virtual machines, and its data
The last requirement maybe could make you smile if you are using from many years some “image-based” backup software, and you know is absolutely a standard feature, but the underlying message from customer is at the end:
Backups needs to last less time!Is it possible, with nowadays technologies, to reduce more and more the RPO values of a backup?
vStorage API, joys and pains
First, let’s recap and VMware backups work today.
vStorage API enable hot backups of running virtual machines without any interruption. The backup workflow is as follows:
- a virtual machine’s snapshot is taken
- vmdk disk is no more locked for reading, and writes are saved in a “delta disk”
- via VADP libraries, a backup software copies the vmdk disk (fully or incrementally) in a different position
- when backup is finished, changes recorded in the delta disk are written back in the parent vmdk disk and the snapshot is deleted (this is what is called a snapshots’ commit)
This procedure has proven its qualities and efficiency in years: it can be used with any backup software without the need to be specifically written for a given application hosted in the VM, and the only requirement is to support vStorage API. Also, any storage system supported by VMware is automatically supported by these libraries.
This way a company can switch its storage, or even the backup software, without loosing any functionality in data protection of its virtualized environment.
But, there are also some problems and limits, that are slowly becoming real ones as time goes by.
Limits. The use of snapshots has brought to backups the same limits: since you cannot take a snapshots of indipendent disks, FT protected VMs, physical RDM disks or linked clones, all these limits also apply to backups made via vStorage API. If small and medium customers do not feel this as a problem, since they rarely use those configurations, bigger customers feel these limits as real and concerning problems, and force them to keep in place also agent-based backup software installed inside those VMs, so doing backups as they were used to do in physical servers in the past.
But probably the biggest problem is another one: even with optimized vStorage API based backup, a job can last longer than what a customer can afford, and the cause is right in point 4 of the workflow I listed before.
Let’s see how it happens: technology progress in storage hardware and more and more efficient algorithms in backup softwares brought faster and faster backups, especially if you use a disk based backup instead of tapes.
On the other hand, image-based backups suffer when a virtual machine has high I/O on the virtual disk: for the whole duration of the backup job, new writes are saved on the delta disk. The longer the backup, the more writes are saved, and they shall be written back into the parent vmdk after the backup.
It’s becoming more and more common to find environments where a 1 hour backup is followed by a 3-4 hours snapshot commit. If you have among your VMs a MS Exchange Server (especially 2010) chances are you already saw this behaviour. In these situations, a guaranteed RPO is almost impossible to reach: a commit operation can last few minutes on one day and several hours the day after; this variability sometimes is even worse than a poor by fixed RPO.
In the future, I feel these problems will become even worst: production environments will host heavier workloads, thus requiring more I/O or I/O patterns that are unfriendly to how a virtual disk works (right now Exchange 2010 is probably the most resounding example). And even if your virtual machine produces a low number of I/O, since it’s hosted on the same datastore with more demanding VMs the total amount of I/O could still be a problem.
To quote Einstein: “The significant problems we face cannot be solved at the same level of thinking we were at when we created them.”. We probably need a new approach to backup problems.
New backup methods?
Basically, we need a deeper integration between backup software and the storage hosting our vmdk files, thus lowering at a minimum or maybe even erase the impact of VMware snapshots commits.
In a phrase: No more VMware snapshots, no more painful commits, and a guaranteed steady RPO!
Among some implementations of these ideas, I had the opportunity to test CommVault Simpana SnapProtect, and Veeam Explorer for SAN Snapshots. Let’s first see briefly how they work.
Simpana SnapProtect
- vCenter takes a snapshot of the virtual machine
- Simpana requires to the storag eto take a snapshot of the LUN where the vmdk is hosted
- as soon as the LUN snapshot is made, Simpana asks vCenter to commit the VM snapshot. Since these operations took maybe less than a minute, vCenter commit could be really fast even on high I/O VMs
- Simpana mounts storage snapshot as a new LUN in an ESXi server acting as a “proxy”, and from here it can do the backup with ease. Even if the backup would last 1 ora, the restore point is still the one recorded when vCenter took the VM snapshot.
Veeam Explorer for SAN Snapshots
- storage takes a LUN snapshot
- when you need a restore, the LUN snapshot is mounted on an ESXi server, and Veeam reads its content to realize the restore.
As of today Veeam does not extract backups from the snapshot, so data are still inside the production storage, unless you configure the storage itself to do remote snapshot on a secondary storage. I hope and I believe in next releases they will be able to extract backups from storage snapshots.
for sure, right now the most advanced implementation is CommVault Simpana.
Taking a deep look to the requirements of both products, some considerations could be done.
“Storage-Aware” Backups
While vStorage APIs guarantee storage-independent backups, these new technologies are heavily based on a deep interaction between backup software and the storage itself. Both CommVault and Veeam have a limited and strict list of supported storage models, and you cannot use these technologies on storage generically suported by VMware (although you can always use as usual vStorage API).
I already met some customers, and among their requirements they asked for these “storage-aware” backups. This choice for sure is going to influence backwards the choice of the production storage for the virtualized environment.
In the future, we would probably see a deeper integration between those two elements.