Since Patch 3 of Veeam Backup & Replication v7 (build 7.0.0.839) there has been a new mode to manage hotadd backups over NFS, available via a registry key. Per the original release notes:
Intelligent load balancing can now be configured to give preference to backup proxy located on the same host using the EnableSameHostHotaddMode (DWORD) registry value.
I’ve kept this post on hold for a while, since with the upcoming v9, DirectNFS will be a much better option than virtual proxies to backup virtual machines running on NFS shares. But there are situations where this key may be still needed, like people still wanting to use virtual proxies against NFS. So, what is this key, and what you can do with it?
The problems with NFS
The key has been originally developed to overcome some issues arising in NFS storage arrays used in a vSphere environments, and the chosen backup mode is hotadd. Until Veeam Backup & Replication v9 will be released and so Direct NFS will be available as an alternative, the only possible modes with NFS arrays are hotadd and network mode. When 10Gb networks are available, network mode is fast enough, and for this reason we usually suggest it. But there are situations where a user wants still to leverage hotadd: maybe the ethernet connections to the storage are 1 Gb, or in a pure virtualized environment he wants to have Veeam proxies deployed as VMs and do not have to add physical proxies to run network mode effectively.
In a hotadd scenario, NFS storage is the worst enemy you can have. Because of NFS v3 locking mechanisms (is this solved with NFS v4.1 in vSphere 6? I still need to verify this…), if the proxy and the VM you are trying to backup are not on the same host, you might experience stuns when the VMDK is released after the backup. VMware has an interesting KB article that you can find here: Virtual machines residing on NFS storage become unresponsive during a snapshot removal operation. VMware suggests to force NBD mode (what Veeam calls Network mode) or to be sure a proxy using hotadd is only processing VMs running in the same host were it’s running.
Easier said than done.
The snowball effect
The introduction of the registry key back in Veeam Backup & Replication v7 Patch 3 seemed so promising, as it was forcing a hotadd proxy to process only local VMs. Everyone was excited, and in fact both some of my colleagues and others blogged about it. Problem was, the reality was a little bit different. I was as always skeptical unless proven wrong, and I wanted first to proper test the key. I was lucky enough, because in the same period I was working together with Derek Seaman from Nutanix on a paper about Best Practices on using Veeam with Nutanix on VMware vSphere. We tested the key extensively, but we decided at the end to still suggest network mode. The registry key was working, but with a problem, that I called during our conversations “the snowball effect”.
The scenario is not limited to Nutanix; you can think about having one proxy on every ESXi host connected to any NFS storage array, but for sure Data Locality capabilities of Nutanix make even more sense here. At first, we deployed one Veeam proxy on each Nutanix node and we created DRS exceptions to keep each proxy local to its assigned ESXi server. Then, we configured backup jobs using hotadd mode and processing VMs that were distributed on several hosts. At the beginning the job worked as expected, but because of the NFS locks, proxies were still suffering the locks themselves and especially during the release of the VMDK disks sometimes proxies were busy for a long period and not available to process additional disks. With many VMs in a job, the central Veeam server was looking for available processing slots and if it happened to have remote proxies (a proxy not on the same host where the protected VM was running) with available processing slots, VMs where assigned to these proxies.
You see immediately where the problem is: this proxy is not local to the processed VM, data locality is lost, and the NFS locks immediately start to affect the entire environments. The more VMDKs were assigned to remote proxies, the more NFS locks were raising, the more stun issues were surfacing, down to the almost complete lock of all the running VMs in some tested scenario. As said, a snowball effect.
Still, we wanted to have a better solution than network mode: NBD mode is not ideal especially in an hyperconverged platform like Nutanix: being virtualized and with converged storage, traffic of data regarding remote VMs is flowing in and out multiple times from the same network, thus not optimizing performances. Better than crashing everything with the pesky NFS locks, but still not ideal.
A new value for the registry key
The description of the registry key has in it the explanation of the reason of the snowball effect: it gives preference to local proxies, not a mandatory usage of them. It’s a logic choice because if there are no local proxies backups are never completed and VMs are left unprotected. You still want after all backups to be completed. What was missing was an additional option, and I’m happy that after the tests we made and the feedback we gave to Veeam developers, a new option has been added some months ago (again, sorry for holding this post for some months): one of the gems added to Veeam Backup & Replication v8 Update 2 is in fact a new value for this registry key. Now the possible values are:
HKLM\SOFTWARE\Veeam\Veeam Backup and Replication\EnableSameHostHotaddMode (DWORD):
0 = any available proxy is used regardless being local or remote to the processed VMs
1 = preference is given to local proxy. If a local proxy is not deployed or not available when the VM has to be processed, job is failed over to a remote proxy still using hotadd
2 = preference is given to local proxy. If a local proxy is not deployed or not available when the VM has to be processed, job is failed over to a remote proxy using network mode
A simple change in the behaviour of the failover to a remote proxy is the “key”. No more lock issues because there will never be a remote proxy using hotadd, it will be either local hotadd or remote network mode. The process of assigning a proxy is now going to be:
1- find the host the VM runs on (or where it is registed if it’s powered off)
2- look at available proxies to see if anyone is local to the VM
3- if YES, run the backup using hotadd mode
4- if NO, select a remote proxy and backup the VM using network mode
The paper we released together with Nutanix and related to Veeam Backup & Replication v8 still suggests to use Network Mode / NBD. The registry key at that time was too new and not tested to suggest it, especially after the issues I faced with the previous version.
But for sure in the future, this new value of the registry key is the way to go to use virtual proxies over NFS storage.
UPDATE: if you are running at least Veeam Backup & Replicatino 9.0, I highly suggest you to use DirectNFS if possible.