After a change in the autoscale setting, the cluster started adapting to a new pg_num and reporting slow operations on osd.44.
The cluster stabilized on HEALTH_WARNING with some PGs unable to get allocated and osd.44 misbehaving.
Tried restarting the osd.44 service on cloudcephosd1005 and ended up with the service down due to:
● ceph-osd@44.service - Ceph object storage daemon osd.44 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled) Active: active (running) since Wed 2020-11-25 08:37:24 UTC; 5min ago Process: 7686 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 44 (code=exited, status=0/SUCCESS) Main PID: 7690 (ceph-osd) Tasks: 59 Memory: 1.7G CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@44.service └─7690 /usr/bin/ceph-osd -f --cluster ceph --id 44 --setuser ceph --setgroup ceph Nov 25 08:37:24 cloudcephosd1005 systemd[1]: Starting Ceph object storage daemon osd.44... Nov 25 08:37:24 cloudcephosd1005 systemd[1]: Started Ceph object storage daemon osd.44. Nov 25 08:37:30 cloudcephosd1005 ceph-osd[7690]: 2020-11-25 08:37:30.314 7f56c8a01c80 -1 osd.44 106484 log_to_monitors {default=true} Nov 25 08:37:30 cloudcephosd1005 ceph-osd[7690]: 2020-11-25 08:37:30.322 7f56c8a01c80 -1 osd.44 106484 mon_cmd_maybe_osd_create fail: 'osd.44 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy
The hdd class does not really exist in the cluster (afaics):
root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush class ls [ "ssd" ]
And the osd.44 is already in the ssd class:
root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush get-device-class osd.44 ssd
Tried removing the class and re-adding again for that osd with no changes:
root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush rm-device-class osd.44 done removing class of osd(s): 44 root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44 set osd(s) 44 to class 'ssd'