Page MenuHomePhabricator

Ceph eqiad cluster: osd.44 failing to start
Closed, ResolvedPublic

Description

After a change in the autoscale setting, the cluster started adapting to a new pg_num and reporting slow operations on osd.44.

The cluster stabilized on HEALTH_WARNING with some PGs unable to get allocated and osd.44 misbehaving.

Tried restarting the osd.44 service on cloudcephosd1005 and ended up with the service down due to:

● ceph-osd@44.service - Ceph object storage daemon osd.44
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
   Active: active (running) since Wed 2020-11-25 08:37:24 UTC; 5min ago
  Process: 7686 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 44 (code=exited, status=0/SUCCESS)
 Main PID: 7690 (ceph-osd)
    Tasks: 59
   Memory: 1.7G
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@44.service
           └─7690 /usr/bin/ceph-osd -f --cluster ceph --id 44 --setuser ceph --setgroup ceph

Nov 25 08:37:24 cloudcephosd1005 systemd[1]: Starting Ceph object storage daemon osd.44...
Nov 25 08:37:24 cloudcephosd1005 systemd[1]: Started Ceph object storage daemon osd.44.
Nov 25 08:37:30 cloudcephosd1005 ceph-osd[7690]: 2020-11-25 08:37:30.314 7f56c8a01c80 -1 osd.44 106484 log_to_monitors {default=true}
Nov 25 08:37:30 cloudcephosd1005 ceph-osd[7690]: 2020-11-25 08:37:30.322 7f56c8a01c80 -1 osd.44 106484 mon_cmd_maybe_osd_create fail: 'osd.44 has already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy

The hdd class does not really exist in the cluster (afaics):

root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush class ls
[
    "ssd"
]

And the osd.44 is already in the ssd class:

root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush get-device-class osd.44
ssd

Tried removing the class and re-adding again for that osd with no changes:

root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush rm-device-class osd.44
done removing class of osd(s): 44

root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44
set osd(s) 44 to class 'ssd'

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2020-11-25T08:45:42Z] <_dcaro> Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722)

Mentioned in SAL (#wikimedia-cloud) [2020-11-25T08:54:29Z] <_dcaro> Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722)

Mentioned in SAL (#wikimedia-cloud) [2020-11-25T09:31:19Z] <_dcaro> The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722)

dcaro triaged this task as High priority.Nov 25 2020, 9:57 AM

It looks like our drives are reporting the wrong data:

# cat /sys/block/sdd/queue/rotational
1

(ceph checks that here https://github.com/ceph/ceph/blob/25ac1528419371686740412616145703810a561f/src/common/blkdev.cc#L222)

So when starting up, as it has this option enabled (default):

# ceph daemon osd.44 config show | jq ".osd_class_update_on_start"
"true"

It tries to register the osd with the detected class (hdd) and fails, though it continues and succeeds eventually

It can be manually overridden (https://lwn.net/Articles/408428/, https://www.mail-archive.com/ceph-users@ceph.io/msg07631.html), that should also improve the usage of the ssds from the OS

The error is gone now the the servers are rebuilt with detecting the ssds :)

\o/