Recently at a customer I upgraded a vSphere 5.0 cluster to 5.1. The update itself was quick and smooth, but I had a problem with HP StoreVirtual Failover manager, or FOM for short.
The HP StoreVirtual FOM (or LeftHand, if you are nostalgic of the old name) is a small linux-based virtual machine, with the only role of witness in a StoreVirtual multi-node, so it can have an odd number of nodes thus ensuring a proper quorum. It’s to be used when there are 2 or 4 StoreVirtual nodes, so you end up having an odd number of managers, and you can be safe from split-brain scenarios.
At this customer, there were two physical P4300 servers and a FOM. The StoreVirtual cluster is the only shared storage of the whole vSphere cluster, and all ESXi servers are installed on SD memories. Since installing the FOM inside the same StoreVirtual cluster it’s going to protect is a grave mistake, and can lead to hard problems (what happens if the storage stops and the FOM is inside the non working storage???), the first ESXi of the cluster has a small local storage of about 70 Gb, only to host the FOM.
During the upgrade to vSphere 5.1, I powered down the FOM and I moved it in one of the LUNs of the StoreVirtual cluster. It’s a temporary solution only for the time needed to upgrade Server 1. Once the upgrade was finished (to be honest, we reinstalled and reconfigured ESXi from scratch) I moved back the FOM on the local storage and powered it on.
After the startup, the StoreVirtual console gave me this error:
That means, for the ones not so used to LeftHand, the FOM was trying to re-join the assigned cluster, but the other two nodes (the two P4300 servers) were refusing it. Checking the console, I started to figure out the problem:
The registerd FOM was missing, while another FOM with the same DNS name and IP address showed up, and this one was not accepted by the cluster. By selecting the missing FOM, the error clearly showed me the error:
In a StoreVirtual cluster, the several components are registered and licensed based on their MAC Addresses, those that are called “Serial number” by the console. The FOM had the address 00:0C:29:FA:A9:CF, but as you can see in the console screenshot, now it was announcing itself as 00:50:56:BF:22:7D.
At least the reason was clear: for some reason the MAC address of the FOM had changed after the upgrade of vSphere from 5.0 to 5.1. Looking around, I found this post by Cormac Hogan, where he explained MAC addresses changed in vSphere 5.1, and the old ones were no more in use. The FOM, using before an address of the block 00:0C:29, now was using an address from the block 00:50:56.
My first test was to be sure the old MAC address was no more usable. After changing it by hand
I saw that anytime I was trying to power on the FOM, the operation was denied with this error:
Reading the comments in Cormac Hogan’s post, I found a way to reconfigure the FOM without the need to delete and reinstall it. Here are the steps:
1) Power down the FOM and remove it from vCenter with the command “Remove from Inventory”
2) via shell or ssh, go into the FOM directory on the datastore and edit the VMX file.
3) here, there are 3 lines we need to take care of:
ethernet0.addressType = “static”
uuid.bios = “56 4d a0 1b 56 3d 38 e5-54 a0 d3 cc a0 fa a9 cf”
ethernet0.address = “00:0c:29:fa:a9:cf”
Look the final values of the uuid.bios line, those are the same exadecimal values of the old mac address. I didn’t check the VMX file before trying to force a static MAC address, so I do not know if the value was already set before my tests.
4) Anyway, you need to change those three lines in this way:
ethernet0.addressType = “generated”
uuid.bios = “56 4d a0 1b 56 3d 38 e5-54 a0 d3 cc a0 fa a9 cf”
ethernet0.generatedaddress = “00:0c:29:fa:a9:cf”
using the values of the old MAC address you want to restore.
5) Finally, register again the virtual machine in vSphere. Look how the configuration is changed:
I was able to restore the old MAC address, but now vSphere thinks it’s a generated one and not a static address. No more conflicts with reserved MAC classes, and in fact the next test to power on the FOM was successful. After few seconds, the FOM was again able to join the cluster, and its status was back to “Manager Normal”
Another check against the overall status of the cluster, and I saw all the managers. The quorum was guaranteed again.