Virtual Volumes, also commonly called VVOLs, has been one of the biggest addition in VMware vSphere 6. Even if they are around since a year, they have a much longer history, has the development and beta phases of this technology has taken quite some time, and while the technology was not available yet, there have been a lot of discussions around it. Now VVOLs are available, and if your storage array supports them, you can start to play with it and decide if it’s time to migrate from monolithic VMFS volumes to this new exciting storage technology.
VVOLs have several advantages over regular VMFS volumes, from the granularity of the volume management (essentially, we have now one “LUN” per virtual disk), to policy-based management, and so on. One of the aspects that people didn’t focused too much is the impact on backup operations coming from VVOLs. Anyone working on a vSphere environment knows that vSphere snapshots are far from being perfect. They were designed as a protection mechanism and to be able to revert back to a previous version during tests or upgrades; they eventually became the main solution for image-level backups, but the have some huge performance impact on backup operations. Different solutions have been developed to mitigate these limits, like Veeam Backup from Storage Snapshots (my colleague Timo Dewin has a great article that also explains how vSphere snapshots work, have a look here: What is the buzz around Backup from Storage Snapshots anyway?), while VMware itself improved a lot the snapshot technology in vSphere 6 (I talked about it in my post With vSphere 6, snapshot consolidation issues are a thing of the past!). Bust still, native storage snapshots are going to be probably more efficient, and VVOLs are starting to be adopted more and more.
VVOLs completely rely on the underlying storage system. It means that also snapshots are created at the storage layer, so it can be that depending on the used storage solution, a snapshot can be a completely transparent operation, as many modern storage systems have really great snapshot technology. For sure better than vSphere snapshots. Eric Siebert wrote a nice article about VVOLs and backups: How VMware Virtual Volumes (VVols) will impact your backups. Indeed the snapshot technology in VVOL sounds great, especially compared with the good old redolog snapshots we are used to use.
But my curiosity did not accept just some lines of text, I wanted to check the performance of VVOLs backups compared to regular VMFS backups. Time to spin up some VMs and run some tests in the lab!
Setting up the lab
In order to run some tests, and be sure that they were comparable, I needed first of all a storage system able to expose both VMFS and VVOL volumes. In our shared company lab we have a Nimble CS300, and thanks to the latest updates we applied to the machine (firmware at the time of these tests is 3.5.2), we are now able to consume VVOLs from the same machine that is also exposing multiple VMFS volumes. In this way, I can run both the storage technologies over the same array, and I can obtain more comparable results:
Then, on the VMFS volume I created a Windows 2012 R2 virtual machine with 8 vCPU, 16 GB of memory and a 40GB hard disk. In additon, there is a second disk with 200GB of space to run some tests with Microsoft Jetstress 2013. I configured Jetstress to run this test: 80% of storage consumption (a 127.90 GB database was created by the tool), Suppress tuning and use 100 threads, performance test type, run background database maintenance during the test, 1 database with 1 copy of the database, 2 hours duration. It’s not maybe the best test for the size of the virtual machine, but I was just looking for a test to put some IO to the storage.
1. Performance of the live VM over VMFS
First, I run the test without any backup operation involved. After the two hours, the result was 7943 Achieved Transactional I/O per Second, and this was the graph coming from the Nimble console:
I could have probably pushed the test a bit higher by using more threads or more databases, but this was not the goal. I only needed an initial value to be then used as a reference during the other tests.
2. Performance of the live VM over VVOL
In order to be sure the test could have some meaning, I’ve run the same exact test with the virtual machine running now over the VVOL container. After a simple storage vmotion between the two storage types hosted by the Nimble array, I’ve re-run the same exact test. The result this time was 8067 Achieved Transactional I/O per Second, a value that is practically the same as the previous test. The performance graph on the Nimble array is really similar to the previous one:
Note the filter made for the virtual disk named “jetstress-1.vmdk”. With VVOLs we have complete visibility down at the singe virtual disk, as this is now an object that exists in the storage array, compare to the previous test where we only had visibility of the entire vmfs volume. Another nice consequence of moving from volumes to VVOLs.
3. Backup performance over VMFS
After the initial tests, I setup a simple Veeam Backup job to protect this virtual machine. The idea was to run the backup in the middle of the jetstress benchmark, to have a virtual machine with a lot of I/O and thus many changed blocks on the virtual disk. This is in fact one of the scenarios where the standard vSphere snapshots have problems, as IO is redirected over both the snapshot and the base disk. I executed the backup using every default option in Veeam, and I used every time a new jetstress database and a Veeam active full backup. The first test has been done on the VMFS volume:
I started the Jestress performance test at 10.57 and the backup job of the Jestress VM at 11.16. The backup itself was really fast, it took less than 10 minutes to backup 130GB of the exchange database, at a speed of 230 MB/s over network mode (yes, this is a physical proxy with 10GB links to our lab). But this is not the most notable part, the one that probably every experienced VMware administrator can recognize is the snapshot removal time: 27 minutes, almost three times more than the backup operation!!! And note that this is a vSphere 6.0 cluster, so with all the snapshot enhancements I talked about in my post With vSphere 6, snapshot consolidation issues are a thing of the past!. The VM never became unresponsive, but because of the insane amounf of I/O that Jetstress created, vSphere had to commit all those changed blocks back into the original disk, and it took a lot of time.
But there is another interesting graph:
I added the vertical lines based on the Veeam logs:
11:17:28, snapshot creation is started
11:17:30, snapshot creation is completed
11:27:45, snapshot commit is started
11:54:57, snapshot commit is completed
As you can reasily spot, performance had a notable decay during the snapshot commitment. Not as much as it would have been during the vSphere 5.x times, but still the workload was impacted. Maybe the commit of a snapshot is not going to create major issues, but still performance are affected when the VM runs over a snapshot.
4. Backup performance over VVOLs
Now, let’s repeat the same exact test after I migrated again the virtual machine over VVOL. Here’s the results:
Forget the absolute performance, as the backup on VVOL has been executed during the night so there were other backups running against the same storage. The important part is the snapshot removal, that took only 5 seconds, compared to 27 minutes before! This is because the snapshot is completely delegated to the underlying storage, that uses its own technology to create and remove snapshots of virtual volumes. And what about performance of Jetstress during the backup?
The backup operations have been:
23:00:53, snapshot is created
23:17:41, snapshot commit is started
23:17:45, snapshot commit is completed
Snapshot commit start and finish are on the same line, as they happened 4 seconds one after the other. And you can see from the graph that no performance decrease has been measured.
Final notes
VVOLs are bringing to VMware environments a lot of benefits, from granularity of managements down to the single virtual disk, to policy based management, but also backup operations are going to benefit a lot from this new technology, as I hope I was able to show you with my tests. If you are looking to justifications to move towards VVOLs, the possibility to run backups faster and without impact to your production environment can be a good one.