Also available in this series:
Part 1: Introduction
Part 2: Architecture for Dummies
Part 3: Design the nodes
Part 4: deploy the nodes in the Lab
Part 5: install Ceph in the lab
Part 6: Mount Ceph as a block device on linux machines
Part 7: Add a node and expand the cluster storage
Part 8: Veeam clustered repository
Part 10: Upgrade the cluster
In previous part 8, I’ve showed you how to create the clustered front-end for our repository. In this part, we’ll see different failover scenarios, and what happens to Veeam running jobs.
Configure the cluster in Veeam
Connecting the new repository in Veeam is as easy as usual, the only difference is to point to the Virtual IP or the virtual hostname, instead of the single physical node. When we proceed to configure it as a Linux repository, first of all Veeam will show us the SSH fingerprint:
There’s a simple trick to verify both servers are exposing the same public SSH key. On a Linux or Mac computer, connect to both nodes via ssh. On each first connection, ssh client will ask us to trust the ssh key, and it will then store it in the known_hosts file. By looking at this file with a command like this:
cat /Users/luca/.ssh/known_hosts | grep -i 10.2.50.16 10.2.50.161 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPoLdOCqm9QA+133DsNuc5yKUjRqGR9dU/TuB7BIE5sMIqxEUeZI1N9TWLdXyhYPk1dET/g/SAYdozF1Bf5qw/vwaiv2Dw5KNe39JkePriVp8//Ceod9XEpJ+Y6TxRe4d6+/1ypGsW6sMflFetxdBwtmQzkymrdaoQ9atrdd5b8cw+ft+cONRBw0Eln4KAKQnEuhwM0/pK5UUPExdL4LkmNGM1MJ3oWUurBfb+Mtk5KywuWp5M1V9bwrdFN2dn/pHCaF8xN/h85/lptV++skTr0RgfUsy5MQkgGX9pI01Mw9XXsBL+2RNuWkGDkbeGJSwufpVC8P9fiIL07+/z9Dz/ 10.2.50.162 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPoLdOCqm9QA+133DsNuc5yKUjRqGR9dU/TuB7BIE5sMIqxEUeZI1N9TWLdXyhYPk1dET/g/SAYdozF1Bf5qw/vwaiv2Dw5KNe39JkePriVp8//Ceod9XEpJ+Y6TxRe4d6+/1ypGsW6sMflFetxdBwtmQzkymrdaoQ9atrdd5b8cw+ft+cONRBw0Eln4KAKQnEuhwM0/pK5UUPExdL4LkmNGM1MJ3oWUurBfb+Mtk5KywuWp5M1V9bwrdFN2dn/pHCaF8xN/h85/lptV++skTr0RgfUsy5MQkgGX9pI01Mw9XXsBL+2RNuWkGDkbeGJSwufpVC8P9fiIL07+/z9Dz/
we will see that both nodes are using the same SSH key.
Moving on in the wizard, the active node (repo2 in my case) will allow us to connect and see the available space of the RDB block device:
If for any reason I failover the active node to repo1, and I rescan the repository, Veeam will complete the operation successfully:
This is obviously a static situation, where no activity is running. Way more interesting is to see what happens when a failover happens while a Veeam backup job is running.
Run the first backup
As I’ve shown in Part 5, a Ceph cluster can continue its operations even when a node fails, as long as enough surviving nodes can handle all the objects of the cluster itself. To test this failover in a “pseudo-production” environment, I’ve first configured a backup job that will use the Ceph cluster as its repository. I’m saving a single VM, 50 GB in size, running everytime an active full backup and deleting each time the backup file to keep the Ceph cluster empty, with no additional guest processing. Backup mode is forever forward incremental. First, I’ve executed the job without failing any cluster component, just to check its behaviour:
Ignore the performances, my Ceph cluster is made of VMs running in the same storage array as all my infrastructure, so the same of the protected VM and the Veeam components executing the job. What’s interesting is the changes happening in the Ceph cluster. You can either decide to use ceph -w to monitor in real-time the cluster, but since the job is going to last for a while, your shell buffer could maybe not be enough; or you can open afterwards a log in one of the MON servers. In my case, I’ve retrieved the interesting parts from mon1, reading the log saved in /var/log/ceph/ceph-mon.mon1.log. You just need to parse the lines and filter them searching for “pgmap”:
2015-03-15 16:36:38.398469 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19017: 256 pgs: 256 active+clean; 45916 kB data, 554 MB used, 1198 GB / 1199 GB avail 2015-03-15 16:41:07.288751 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19047: 256 pgs: 256 active+clean; 190 MB data, 1119 MB used, 1198 GB / 1199 GB avail; 4941 kB/s wr, 21 op/s 2015-03-15 16:45:54.694453 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19258: 256 pgs: 256 active+clean; 2430 MB data, 7582 MB used, 1191 GB / 1199 GB avail; 19604 kB/s wr, 87 op/s 2015-03-15 16:50:55.052215 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19489: 256 pgs: 256 active+clean; 4882 MB data, 12772 MB used, 1186 GB / 1199 GB avail; 10682 kB/s wr, 46 op/s 2015-03-15 16:55:55.182653 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19713: 256 pgs: 256 active+clean; 7230 MB data, 16992 MB used, 1182 GB / 1199 GB avail; 7825 kB/s wr, 34 op/s 2015-03-15 17:00:48.650250 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19856: 256 pgs: 256 active+clean; 7875 MB data, 16440 MB used, 1183 GB / 1199 GB avail; 4777 B/s wr, 0 op/s 2015-03-15 17:05:48.705349 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19903: 256 pgs: 256 active+clean; 7964 MB data, 16498 MB used, 1183 GB / 1199 GB avail; 489 kB/s wr, 2 op/s 2015-03-15 17:10:51.922792 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v19955: 256 pgs: 256 active+clean; 8049 MB data, 16666 MB used, 1183 GB / 1199 GB avail; 1550 B/s wr, 0 op/s 2015-03-15 17:15:52.850740 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v20006: 256 pgs: 256 active+clean; 8145 MB data, 16851 MB used, 1182 GB / 1199 GB avail; 555 B/s rd, 454 kB/s wr, 2 op/s 2015-03-15 17:20:52.051202 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v20068: 256 pgs: 256 active+clean; 8457 MB data, 17709 MB used, 1182 GB / 1199 GB avail; 257 kB/s wr, 1 op/s 2015-03-15 17:25:52.111999 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v20118: 256 pgs: 256 active+clean; 8558 MB data, 17730 MB used, 1182 GB / 1199 GB avail; 15445 B/s wr, 1 op/s 2015-03-15 17:26:49.155942 7fe93e949700 0 log_channel(cluster) log [INF] : pgmap v20124: 256 pgs: 256 active+clean; 8558 MB data, 17702 MB used, 1182 GB / 1199 GB avail
This is the first ever file hitting the cluster, so the volume was just using 50MB for some filesystem meta data. As the job starts, data is ingested by the cluster up to 8,5 Gb, that is the final size of the backup file:
[root@repo1 ceph-backup-test]# ll total 8745012 -rw-rw-rw-. 1 root root 5695 Mar 15 17:25 ceph-backup-test.vbm -rw-r--r--. 1 root root 8954877952 Mar 15 17:25 ceph-backup-test2015-03-15T163909.vbk
Used size is double, since I’ve configured Ceph with a replication factor of 2, that is 2 copies of each block/object. Finally, for the entire duration of the job, the cluster has been in active+clean state, which means all OSDs were up and running and contributing to the overall cluster.
Back-End failure
Once the cluster has been tested in a stable scenario, let’s see what happens when there’s a failure in the back-end. I repeated the same exact job, again in Active Full, and in the middle of the job I stopped for 15 minutes on of the OSD nodes, osd3 in my case:
In this Veeam report, you just see the job completing successfully as before, just taking longer. This is, first and foremost, the proof the failure in the backend of the Ceph cluster was invisible to the front-end! But what I did on the ceph cluster? The job started at 11:00, and the same log as before was recorded:
2015-03-16 10:59:33.607489 mon.0 [INF] pgmap v20228: 256 pgs: 256 active+clean; 8558 MB data, 17592 MB used, 1182 GB / 1199 GB avail 2015-03-16 11:01:17.719263 mon.0 [INF] pgmap v20234: 256 pgs: 256 active+clean; 8558 MB data, 17593 MB used, 1182 GB / 1199 GB avail; 715 kB/s wr, 5 op/s 2015-03-16 11:02:17.018245 mon.0 [INF] pgmap v20253: 256 pgs: 256 active+clean; 8558 MB data, 17594 MB used, 1182 GB / 1199 GB avail; 12068 kB/s wr, 53 op/s 2015-03-16 11:03:17.853354 mon.0 [INF] pgmap v20301: 256 pgs: 256 active+clean; 8558 MB data, 17598 MB used, 1182 GB / 1199 GB avail; 4392 kB/s wr, 19 op/s 2015-03-16 11:04:17.762674 mon.0 [INF] pgmap v20352: 256 pgs: 256 active+clean; 8558 MB data, 17602 MB used, 1182 GB / 1199 GB avail; 8862 kB/s wr, 38 op/s 2015-03-16 11:05:18.322565 mon.0 [INF] pgmap v20402: 256 pgs: 256 active+clean; 8558 MB data, 17606 MB used, 1182 GB / 1199 GB avail; 3662 kB/s wr, 15 op/s 2015-03-16 11:06:18.428764 mon.0 [INF] pgmap v20444: 256 pgs: 256 active+clean; 8558 MB data, 17604 MB used, 1182 GB / 1199 GB avail; 6528 kB/s wr, 28 op/s 2015-03-16 11:07:18.389620 mon.0 [INF] pgmap v20495: 256 pgs: 256 active+clean; 8610 MB data, 17784 MB used, 1182 GB / 1199 GB avail; 9141 kB/s wr, 40 op/s 2015-03-16 11:08:18.847947 mon.0 [INF] pgmap v20540: 256 pgs: 256 active+clean; 9054 MB data, 19391 MB used, 1180 GB / 1199 GB avail; 9026 kB/s wr, 39 op/s 2015-03-16 11:09:18.358717 mon.0 [INF] pgmap v20588: 256 pgs: 256 active+clean; 9054 MB data, 19488 MB used, 1180 GB / 1199 GB avail; 14901 kB/s wr, 61 op/s 2015-03-16 11:10:07.108054 mon.0 [INF] pgmap v20631: 256 pgs: 256 active+clean; 9054 MB data, 19307 MB used, 1180 GB / 1199 GB avail; 15711 kB/s wr, 67 op/s
At 11:10, I powered down the node osd3, and immediately the monitor nodes traced this:
2015-03-16 11:10:07.484938 mon.0 [INF] osd.6 marked itself down 2015-03-16 11:10:07.486626 mon.0 [INF] osd.8 marked itself down 2015-03-16 11:10:07.497299 mon.0 [INF] osd.7 marked itself down 2015-03-16 11:10:08.152916 mon.0 [INF] osdmap e94: 12 osds: 9 up, 12 in
All the three OSDs from osd3 were now missing, and the protection groups where also no more in active+clean state:
2015-03-16 11:10:09.250237 mon.0 [INF] pgmap v20634: 256 pgs: 59 stale+active+clean, 197 active+clean; 9054 MB data, 19307 MB used, 1180 GB / 1199 GB avail 2015-03-16 11:10:13.701588 mon.0 [INF] pgmap v20635: 256 pgs: 37 stale+active+clean, 38 peering, 181 active+clean; 9054 MB data, 19303 MB used, 1180 GB / 1199 GB avail; 5417 kB/s wr, 23 op/s 2015-03-16 11:10:14.748311 mon.0 [INF] pgmap v20636: 256 pgs: 54 active+undersized+degraded, 75 peering, 127 active+clean; 9054 MB data, 19272 MB used, 1180 GB / 1199 GB avail; 15911 kB/s wr, 66 op/s; 512/4614 objects degraded (11.097%) 2015-03-16 11:10:18.656609 mon.0 [INF] pgmap v20637: 256 pgs: 92 active+undersized+degraded, 37 peering, 127 active+clean; 9054 MB data, 19269 MB used, 1180 GB / 1199 GB avail; 13601 kB/s wr, 57 op/s; 832/4614 objects degraded (18.032%) 2015-03-16 11:10:19.818390 mon.0 [INF] pgmap v20638: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9054 MB data, 19253 MB used, 1180 GB / 1199 GB avail; 14266 kB/s wr, 63 op/s; 1171/4614 objects degraded (25.379%) 2015-03-16 11:11:19.714768 mon.0 [INF] pgmap v20662: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9054 MB data, 19033 MB used, 1180 GB / 1199 GB avail; 14021 kB/s wr, 57 op/s; 1171/4614 objects degraded (25.379%) 2015-03-16 11:12:19.726590 mon.0 [INF] pgmap v20686: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9413 MB data, 19494 MB used, 1180 GB / 1199 GB avail; 12883 kB/s wr, 53 op/s; 1226/4794 objects degraded (25.574%) 2015-03-16 11:13:19.713060 mon.0 [INF] pgmap v20710: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9554 MB data, 19953 MB used, 1179 GB / 1199 GB avail; 12365 kB/s wr, 53 op/s; 1242/4862 objects degraded (25.545%) 2015-03-16 11:14:19.853709 mon.0 [INF] pgmap v20730: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9978 MB data, 21109 MB used, 1178 GB / 1199 GB avail; 9627 kB/s wr, 40 op/s; 1290/5074 objects degraded (25.424%)
The cluster was degraded by 25%, exactly because 1 node out of 4 was missing. The rest of the log was a mix of informations like these for the following minutes:
2015-03-16 11:10:09.250237 mon.0 [INF] pgmap v20634: 256 pgs: 59 stale+active+clean, 197 active+clean; 9054 MB data, 19307 MB used, 1180 GB / 1199 GB avail 2015-03-16 11:10:13.701588 mon.0 [INF] pgmap v20635: 256 pgs: 37 stale+active+clean, 38 peering, 181 active+clean; 9054 MB data, 19303 MB used, 1180 GB / 1199 GB avail; 5417 kB/s wr, 23 op/s 2015-03-16 11:10:14.748311 mon.0 [INF] pgmap v20636: 256 pgs: 54 active+undersized+degraded, 75 peering, 127 active+clean; 9054 MB data, 19272 MB used, 1180 GB / 1199 GB avail; 15911 kB/s wr, 66 op/s; 512/4614 objects degraded (11.097%) 2015-03-16 11:10:18.656609 mon.0 [INF] pgmap v20637: 256 pgs: 92 active+undersized+degraded, 37 peering, 127 active+clean; 9054 MB data, 19269 MB used, 1180 GB / 1199 GB avail; 13601 kB/s wr, 57 op/s; 832/4614 objects degraded (18.032%) 2015-03-16 11:10:19.818390 mon.0 [INF] pgmap v20638: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9054 MB data, 19253 MB used, 1180 GB / 1199 GB avail; 14266 kB/s wr, 63 op/s; 1171/4614 objects degraded (25.379%) 2015-03-16 11:11:19.714768 mon.0 [INF] pgmap v20662: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9054 MB data, 19033 MB used, 1180 GB / 1199 GB avail; 14021 kB/s wr, 57 op/s; 1171/4614 objects degraded (25.379%) 2015-03-16 11:12:19.726590 mon.0 [INF] pgmap v20686: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9413 MB data, 19494 MB used, 1180 GB / 1199 GB avail; 12883 kB/s wr, 53 op/s; 1226/4794 objects degraded (25.574%) 2015-03-16 11:13:19.713060 mon.0 [INF] pgmap v20710: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9554 MB data, 19953 MB used, 1179 GB / 1199 GB avail; 12365 kB/s wr, 53 op/s; 1242/4862 objects degraded (25.545%) 2015-03-16 11:14:19.853709 mon.0 [INF] pgmap v20730: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 9978 MB data, 21109 MB used, 1178 GB / 1199 GB avail; 9627 kB/s wr, 40 op/s; 1290/5074 objects degraded (25.424%) 2015-03-16 11:15:11.246891 mon.0 [INF] pgmap v20743: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 10218 MB data, 16851 MB used, 883 GB / 899 GB avail; 1318/5194 objects degraded (25.375%) 2015-03-16 11:15:12.280995 mon.0 [INF] pgmap v20744: 256 pgs: 129 active+undersized+degraded, 127 active+clean; 10218 MB data, 16851 MB used, 883 GB / 899 GB avail; 1318/5194 objects degraded (25.375%) 2015-03-16 11:15:16.754291 mon.0 [INF] pgmap v20745: 256 pgs: 91 active+undersized+degraded, 30 active+degraded, 128 active+clean, 7 active+recovering+degraded; 10218 MB data, 16845 MB used, 883 GB / 899 GB avail; 1785/5194 objects degraded (34.367%); 29181 kB/s, 7 objects/s recovering 2015-03-16 11:15:17.802814 mon.0 [INF] pgmap v20746: 256 pgs: 1 inactive, 104 active+degraded, 5 peering, 128 active+clean, 18 active+recovering+degraded; 10218 MB data, 16791 MB used, 883 GB / 899 GB avail; 5957 B/s wr, 0 op/s; 2781/5194 objects degraded (53.543%); 61079 kB/s, 15 objects/s recovering 2015-03-16 11:15:22.048257 mon.0 [INF] pgmap v20747: 256 pgs: 1 inactive, 99 active+degraded, 5 peering, 130 active+clean, 21 active+recovering+degraded; 10218 MB data, 16807 MB used, 883 GB / 899 GB avail; 6552 B/s wr, 0 op/s; 2730/5194 objects degraded (52.561%); 61436 kB/s, 15 objects/s recovering 2015-03-16 11:15:23.066830 mon.0 [INF] pgmap v20748: 256 pgs: 98 active+degraded, 135 active+clean, 23 active+recovering+degraded; 10218 MB data, 17292 MB used, 882 GB / 899 GB avail; 2765/5194 objects degraded (53.235%); 74365 kB/s, 18 objects/s recovering 2015-03-16 11:15:26.865555 mon.0 [INF] pgmap v20749: 256 pgs: 97 active+degraded, 139 active+clean, 20 active+recovering+degraded; 10218 MB data, 17150 MB used, 882 GB / 899 GB avail; 2671/5194 objects degraded (51.425%); 87863 kB/s, 21 objects/s recovering 2015-03-16 11:15:27.939319 mon.0 [INF] pgmap v20750: 256 pgs: 97 active+degraded, 139 active+clean, 20 active+recovering+degraded; 10218 MB data, 17433 MB used, 882 GB / 899 GB avail; 2665/5194 objects degraded (51.309%); 44210 kB/s, 10 objects/s recovering 2015-03-16 11:15:31.798139 mon.0 [INF] pgmap v20751: 256 pgs: 96 active+degraded, 140 active+clean, 20 active+recovering+degraded; 10218 MB data, 17473 MB used, 882 GB / 899 GB avail; 2640/5194 objects degraded (50.828%); 15127 kB/s, 3 objects/s recovering 2015-03-16 11:15:32.808174 mon.0 [INF] pgmap v20752: 256 pgs: 92 active+degraded, 143 active+clean, 21 active+recovering+degraded; 10218 MB data, 17630 MB used, 882 GB / 899 GB avail; 2571/5194 objects degraded (49.499%); 36534 kB/s, 8 objects/s recovering 2015-03-16 11:15:36.748112 mon.0 [INF] pgmap v20753: 256 pgs: 91 active+degraded, 144 active+clean, 21 active+recovering+degraded; 10218 MB data, 17653 MB used, 882 GB / 899 GB avail; 2551/5194 objects degraded (49.114%); 33578 kB/s, 8 objects/s recovering 2015-03-16 11:15:38.138830 mon.0 [INF] pgmap v20754: 256 pgs: 90 active+degraded, 145 active+clean, 21 active+recovering+degraded; 10218 MB data, 17741 MB used, 882 GB / 899 GB avail; 2532/5194 objects degraded (48.749%); 17377 kB/s, 4 objects/s recovering 2015-03-16 11:15:41.733473 mon.0 [INF] pgmap v20755: 256 pgs: 88 active+degraded, 145 active+clean, 23 active+recovering+degraded; 10218 MB data, 17741 MB used, 882 GB / 899 GB avail; 2524/5194 objects degraded (48.595%); 15564 kB/s, 3 objects/s recovering 2015-03-16 11:16:39.935016 mon.0 [INF] pgmap v20782: 256 pgs: 57 active+undersized+degraded, 12 active+recovery_wait+undersized+degraded, 32 peering, 90 active+clean, 20 active+recovery_wait+degraded, 35 active+recovering+degraded, 10 undersized+degraded; 10247 MB data, 18121 MB used, 881 GB / 899 GB avail; 229 kB/s wr, 1 op/s; 2095/5210 objects degraded (40.211%); 77/2605 unfound (2.956%); 2197 kB/s, 0 objects/s recovering
Basically, Ceph was already rebalancing the cluster, now down from 1200 GB to 900 available GBs, by replicating the unprotected objets into other OSDs. Still, there was data activity in the nodes, this time created both by the Veeam backup job and the background resync of the cluster. At 11:25 the cluster was again completely balanced even with a missing node:
2015-03-16 11:25:49.251365 mon.0 [INF] pgmap v21061: 256 pgs: 256 active+clean; 10346 MB data, 22093 MB used, 877 GB / 899 GB avail
At 11:25, I powered on again osd3:
2015-03-16 11:25:39.597838 mon.0 [INF] from='client.? 10.2.50.203:0/1001065' entity='osd.6' cmd=[{"prefix": "osd crush create-or-move", "args": ["host=osd3", "root=default"], "id": 6, "weight": 0.1}]: dispatch 2015-03-16 11:25:41.829844 mon.0 [INF] from='client.? 10.2.50.203:0/1002388' entity='osd.7' cmd=[{"prefix": "osd crush create-or-move", "args": ["host=osd3", "root=default"], "id": 7, "weight": 0.1}]: dispatch 2015-03-16 11:25:43.554166 mon.0 [INF] from='client.? 10.2.50.203:0/1002628' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "args": ["host=osd3", "root=default"], "id": 8, "weight": 0.1}]: dispatch 2015-03-16 11:25:45.046448 mon.0 [INF] osd.6 10.2.50.203:6800/2147 boot 2015-03-16 11:25:45.049485 mon.0 [INF] osdmap e118: 12 osds: 10 up, 10 in 2015-03-16 11:25:46.056989 mon.0 [INF] osd.7 10.2.50.203:6803/2435 boot 2015-03-16 11:25:46.057581 mon.0 [INF] osdmap e119: 12 osds: 11 up, 11 in 2015-03-16 11:25:47.107895 mon.0 [INF] osd.8 10.2.50.203:6806/2674 boot 2015-03-16 11:25:47.116572 mon.0 [INF] osdmap e120: 12 osds: 12 up, 12 in 2015-03-16 11:25:48.094400 mon.0 [INF] osdmap e121: 12 osds: 12 up, 12 in 2015-03-16 11:25:49.239387 mon.0 [INF] osdmap e122: 12 osds: 12 up, 12 in
And again, Ceph started to rebalanced the cluster. I’ll skip another bunch of log lines this time, just look at this one:
2015-03-16 11:26:30.702531 mon.0 [INF] pgmap v21084: 256 pgs: 3 inactive, 2 peering, 145 active+clean, 28 active+degraded, 40 active+recovery_wait+degraded, 38 active+recovering+degraded; 10352 MB data, 26770 MB used, 1173 GB / 1199 GB avail; 1016/5262 objects degraded (19.308%); 26294 kB/s, 6 objects/s recovering
The size of the cluster was back to 1200GB, and Ceph was again using the entire available space to balance and protect all the 256 protection groups. And by the end of the backup job, the state was back to normal:
2015-03-16 12:08:52.550625 mon.0 [INF] pgmap v21605: 256 pgs: 256 active+clean; 10578 MB data, 22922 MB used, 1177 GB / 1199 GB avail; 334 B/s wr, 0 op/s.
So, as long as there are enough protection groups in a Ceph cluster, the cluster itself can survive failures of entire nodes and still serve the front-end!
Front-End failure
The previous test would have been possible also with a single front-end mounting the Ceph storage. Time now to test a front-end failure. Again, the job was started in active full, and the Ceph cluster was exposed to Veeam from the Linux frontend repo1:
[root@repo1 ceph-backup-test]# ip addr show ens160 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:50:56:b9:06:08 brd ff:ff:ff:ff:ff:ff inet 10.2.50.161/24 brd 10.2.50.255 scope global ens160 valid_lft forever preferred_lft forever inet 10.2.50.160/24 scope global secondary ens160 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:feb9:608/64 scope link valid_lft forever preferred_lft forever
[root@repo1 ceph-backup-test]# df -h | grep rbd /dev/rbd0 50G 8.4G 42G 17% /mnt/veeamrepo
After 10 minutes from the start of the Veeam job, I powered down repo1. Both the virtual IP and the mount point of the Ceph cluster failed over to repo2:
[root@repo2 bin]# ip addr show ens160 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:50:56:b9:38:a4 brd ff:ff:ff:ff:ff:ff inet 10.2.50.162/24 brd 10.2.50.255 scope global ens160 valid_lft forever preferred_lft forever inet 10.2.50.160/24 scope global secondary ens160 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:feb9:38a4/64 scope link valid_lft forever preferred_lft forever
[root@repo2 bin]# df -h | grep rbd /dev/rbd0 50G 2.0G 49G 4% /mnt/veeamrepo
The job however failed, as the Veeam binaries running in repo1 where not able to see the backend storage anymore:
For this job, I configured a schedule, so the default option is to have 3 retries every 10 minutes:
At the next retry cycle, the job restarted and completed successfully using repo2. In fact, you could see the Veeam executables in the running processes:
11343 ? Ss 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11346 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11347 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11350 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11351 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11358 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11359 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11378 ? Ss 0:00 bash -c (cd /tmp && perl veeam_soap16d046f1-4614-4684-b3b5-3fb8dd89d918.pl -d -c -l lib16d046f1-4614-4684-b3b5-3fb8dd89d918 -e /tmp/veeam_error16d046f1-4614-4684-b3b5-3fb8dd89d918 2>> /tmp/veeam_error16d046f1 11381 ? S 0:00 bash -c (cd /tmp && perl veeam_soap16d046f1-4614-4684-b3b5-3fb8dd89d918.pl -d -c -l lib16d046f1-4614-4684-b3b5-3fb8dd89d918 -e /tmp/veeam_error16d046f1-4614-4684-b3b5-3fb8dd89d918 2>> /tmp/veeam_error16d046f1 11382 ? S 0:00 perl veeam_soap16d046f1-4614-4684-b3b5-3fb8dd89d918.pl -d -c -l lib16d046f1-4614-4684-b3b5-3fb8dd89d918 -e /tmp/veeam_error16d046f1-4614-4684-b3b5-3fb8dd89d918 11393 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11459 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11462 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11464 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11588 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11661 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11664 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11666 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11700 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11701 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11702 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11703 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11704 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11705 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11706 ? R 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe 11707 ? S 0:00 /tmp/VeeamAgent5426b049-bbfe-4f34-b0d9-9eb1c0819144 -l flush,/var/log/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Target__cli.log,/tmp/VeeamBackup/ceph_Bbackup_Btest/Agent.ceph_Bbackup_Btest.Targe
This is the main reason to use this kind of failover: Veeam binaries are deployed on Linux at runtime, and directly executed from /tmp folder. Because they are not permanently deployed and registered as daemons, there’s no way to “clusterize” them. with this configuration however, a running job may fail on a clustered front-end, but each following retry will be attempted on the other node. If you do not change the default configuration of your jobs, all of them will be completed even during a front-end failure. My job had just one VM in it, so it was completely retried, but in regular jobs with multiple VMs, only VMs that were not completed will be retried, all the processed VMs are already stored in the backup file.