I recently received this request from one of our service providers: “Is there a way to trace the failover actions in Veeam Cloud Connect, so that I can figure out the consumption of the virtual environment?”. As I never tried before to figure out this one myself, I thought it was time to hit the lab and learn more about it.
The Failover events
Veeam Cloud Connect Replication (VCC-R) leverages a virtualized environment to replicated tenants’ virtual machines, ready to be powered up when needed. For the majority of the time, the main consumption of the provider resources are bandwidth and storage, used to transfer data and to store them into the virtual disks of the virtual machines. But when a failover is started, either a test failover or a real one, also CPU and memory are consumed. And since a service provider wants to measure and bill tenants for the service, there’s a need to trace these information.
Whenever a failover is performed, this operation is logged into VBR, under the History section of the software, like this:
However, the provider asked specifically for the Windows Event Viewer. Well, as a proper windows application, VCC-R (technically, Veeam Backup & Replication, of which VCC-R is a sub-component) records many events into the Event Viewer, under a dedicated log:
By looking at the dates of the invoked Failover, we can clearly identify two records in the Veeam Event Viewer.
Let’s start with a partial failover. The start of the failover is logged in Event 26800:
In the details, we can read the XML content of the event that has additional information, and I’ve cut here the important ones:
<Provider Name="Veeam MP" /> <EventID Qualifiers="0">26800</EventID> <TimeCreated SystemTime="2018-03-18T08:20:59.000000000Z" /> <EventRecordID>2137</EventRecordID> <Channel>Veeam Backup</Channel> <Computer>vbr.cloudconnect.local</Computer> <Data>vm-157</Data> <Data>Trying to start tenant's replica</Data>
A couple of notes: the time is UTC based, and since my servers are running in Central Europe Time, the on-screen time is 09:20 while the log states 08.20, but they are in reality the same time. Also, we read in the log vm-157, and if we go into vCenter to check what’s this machine, we find that it is indeed the replicated VM:
By correlating this information with Cloud Connect, a service provider can identify which VM has been started during a failover, and who’s the tenant the VM belongs to. By querying also in vCenter the amount of CPU and memory that the VM has, they can count CPU/minute and Memory/minute values, to eventually charge the tenant.
In the same way, there is Event 26900 when the failover is stopped:
The details in the XML files are the same as before, except the different event number.
What about a full failover? In my example, a VCC Failover Plan has three virtual machines:
Looking at Windows Event Viewer, the situation is actually very similar to the partial failover:
We have the same EventID 26800, this time obviously one for each virtual machine that is part of the failover plan. And inside each of them, there’s the same information about which VM is powered on. This is good, since in a failover plan there may be delays between the boot of multiple VMs, so if a provider counts the power-on time by the seconds, they can properly trace these information.
And at the end, just like for the partial failover, the EventID 26900 is recorded for the end of the failover.