In my two previous posts about the new Ceph 12.2 release, named Luminous, I first described the new Bluestore storage technology, and I then upgraded my cluster to the 12.2 release. By default, Ceph can run both OSD using Filestore and Bluestore, so that existing clusters can be safely migrated to Luminous. On the long run, however, users who have previously deployed FileStore are likely to want to transition to BlueStore in order to take advantage of the improved performance and robustness. However, an individual OSD cannot be converted in place. The “conversion” is, in reality, the destruction of a Filestore and the creation of a Bluestore OSD, while the cluster takes care every time of evacuating the old OSD, replicate its content into other OSDs, and then rebalance the content once the new Bluestore is added to the cluster.
Rip and replace Ceph OSDs
The simplest approach to migrate to Bluestore is to mark out each device, one by one, wait for the data to re-replicate across the cluster, reprovision the OSD, and mark it back in again. It is simple and easy to automate. However, it requires more data migration than should be necessary, as data has to be moved twice, out and in again. In my cluster, using 12 OSDs, I decided to go for this method. In this page the Ceph team has described two additional methods if you are interested.
1 – Let’s choose one FileStore OSD to be replaced as the first example. We will then just repeat this procedure for all the other OSDs. Using this command:
ceph osd metadata $ID | grep -e id -e hostname -e osd_objectstore
The returned list shows us 12 OSDs, from 0 to 11, the host hosting them, and the fact that they are all using FileStore. During the upgrade process, you can check the balance between FileStore And BlueStore with ceph osd count-metadata osd_objectstore. In my example, it will start with FileStore = 12, and hopefully, at the end it will tell me BlueStore = 12.
2 – Let’s mark the OSD # 0 out of the cluster with ceph osd out 0. We will then wait for the data to migrate off the OSD in question; we can check the progress with:
while ! ceph osd safe-to-destroy $ID ; sleep 60 ; done
3 – Stop the OSD:
systemctl kill ceph-osd@0
4 – make note of which device this OSD is using:
mount | grep /var/lib/ceph/osd/ceph-0
It will return in my case:
/dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime,attr2,inode64,noquota)
5 – we then unmount the OSD:
umount /var/lib/ceph/osd/ceph-0
6 – This step is where we cannot go back in time if we did something wrong. We will destroy the OSD here. Be careful:
ceph-disk zap /dev/sdc
7 – Tell the cluster the OSD has been destroyed:
ceph osd destroy 0 --yes-i-really-mean-it
8 – Time to reprovision a BlueStore OSD in place of the previous FileStore, with the same OSD ID:
ceph-disk prepare --bluestore /dev/sdc --osd-id 0
9 – Go to step 1 and repeat for each OSD. When you check the progress, you will see ceph versions giving you this type of result:
To recap, this is the complete procedure for each OSD, replacing each time the ID and the device:
ceph osd out 1 systemctl kill ceph-osd@1 umount /var/lib/ceph/osd/ceph-1 ceph-disk zap /dev/sdd ceph osd destroy 1 --yes-i-really-mean-it ceph-disk prepare --bluestore /dev/sdd --osd-id 1
At the end of the procedure, my entire cluster has been migrated to BlueStore.
As noted by the Ceph team: “you can allow the refilling of the replacement OSD to happen concurrently with the draining of the next OSD, or follow the same procedure for multiple OSDs in parallel, as long as you ensure the cluster is fully clean (all data has all replicas) before destroying any OSDs. Failure to do so will reduce the redundancy of your data and increase the risk of (or potentially even cause) data loss.”