Page MenuHomePhabricator

[cookbook,ceph] depool_and_destroy ceph cookbook failed to destroy a single osd
Open, MediumPublic

Description

Tried to destroy osd 66 from node cloudcephosd1004, it did depool and get it off the cluster, but failed to zap it:

No changes were made to the cluster, skipping waiting for rebalance.
Destroying OSDs with ids in [66] on cloudcephosd1004 from eqiad1
Not cleaning up host bucket, as it still has some OSDs in it
Depooled and destroyed OSD daemons [66].
Exception raised while executing cookbook wmcs.ceph.osd.depool_and_destroy:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 265, in _run
    raw_ret = runner.run()
  File "/srv/deployment/wmcs-cookbooks/wmcs_libs/common.py", line 825, in _wrapped_run
    return object.__getattribute__(self, __name)(*args, **kwargs)
  File "/srv/deployment/wmcs-cookbooks/wmcs_libs/common.py", line 843, in run
    return self.run_with_proxy()
  File "/srv/deployment/wmcs-cookbooks/cookbooks/wmcs/ceph/osd/depool_and_destroy.py", line 261, in run_with_proxy
    self._zap_drives(devices=devices)
  File "/srv/deployment/wmcs-cookbooks/cookbooks/wmcs/ceph/osd/depool_and_destroy.py", line 305, in _zap_drives
    raise Exception("No devices found to zap, aborting")
Exception: No devices found to zap, aborting
END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99)

Detail log:

025-08-21 12:50:13,909 dcaro 3429876 [DEBUG spicerack.remote:750 in _execute] Executing commands [cumin.transports.Command('sudo -i ceph osd purge 66 --yes-i-really-mean-it -f json', ok_codes=[0])] on 1 hosts: cloudcephmon1004.eqiad.wmnet
2025-08-21 12:50:13,909 dcaro 3429876 [INFO cumin.transports.clustershell.ClusterShellWorker:78 in execute] Executing commands [cumin.transports.Command('sudo -i ceph osd purge 66 --yes-i-really-mean-it -f json', ok_codes=[0])] on '1' hosts: cloudcephmon1004.eqiad.wmnet
2025-08-21 12:50:13,914 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:590 in ev_pickup] node=cloudcephmon1004.eqiad.wmnet, command='sudo -i ceph osd purge 66 --yes-i-really-mean-it -f json'
2025-08-21 12:50:14,890 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:783 in ev_hup] node=cloudcephmon1004.eqiad.wmnet, rc=0, command='sudo -i ceph osd purge 66 --yes-i-really-mean-it -f json'
2025-08-21 12:50:14,891 dcaro 3429876 [INFO cumin.transports.clustershell.SyncEventHandler:853 in ev_timer] Completed command 'sudo -i ceph osd purge 66 --yes-i-really-mean-it -f json'
2025-08-21 12:50:14,891 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:759 in end_command] This was the last command
2025-08-21 12:50:14,893 dcaro 3429876 [DEBUG spicerack.remote:750 in _execute] Executing commands [cumin.transports.Command('sudo -i ceph osd tree -f json', ok_codes=[0])] on 1 hosts: cloudcephmon1004.eqiad.wmnet
2025-08-21 12:50:14,894 dcaro 3429876 [INFO cumin.transports.clustershell.ClusterShellWorker:78 in execute] Executing commands [cumin.transports.Command('sudo -i ceph osd tree -f json', ok_codes=[0])] on '1' hosts: cloudcephmon1004.eqiad.wmnet
2025-08-21 12:50:14,899 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:590 in ev_pickup] node=cloudcephmon1004.eqiad.wmnet, command='sudo -i ceph osd tree -f json'
2025-08-21 12:50:15,655 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:783 in ev_hup] node=cloudcephmon1004.eqiad.wmnet, rc=0, command='sudo -i ceph osd tree -f json'
2025-08-21 12:50:15,655 dcaro 3429876 [INFO cumin.transports.clustershell.SyncEventHandler:853 in ev_timer] Completed command 'sudo -i ceph osd tree -f json'
2025-08-21 12:50:15,656 dcaro 3429876 [DEBUG cumin.transports.clustershell.SyncEventHandler:759 in end_command] This was the last command
2025-08-21 12:50:15,667 dcaro 3429876 [INFO cookbooks.wmcs.ceph.osd.depool_and_destroy:300 in _destroy_osds] Not cleaning up host bucket, as it still has some OSDs in it
2025-08-21 12:50:15,668 dcaro 3429876 [INFO spicerack_sal_logger:464 in log] Depooled and destroyed OSD daemons [66].
2025-08-21 12:50:15,668 dcaro 3429876 [DEBUG wmcs_libs.common:828 in _wrapped_run] Cleaning up recorder.
2025-08-21 12:50:15,669 dcaro 3429876 [ERROR spicerack._menu:292 in _run] Exception raised while executing cookbook wmcs.ceph.osd.depool_and_destroy:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 265, in _run
    raw_ret = runner.run()
  File "/srv/deployment/wmcs-cookbooks/wmcs_libs/common.py", line 825, in _wrapped_run
    return object.__getattribute__(self, __name)(*args, **kwargs)
  File "/srv/deployment/wmcs-cookbooks/wmcs_libs/common.py", line 843, in run
    return self.run_with_proxy()
  File "/srv/deployment/wmcs-cookbooks/cookbooks/wmcs/ceph/osd/depool_and_destroy.py", line 261, in run_with_proxy
    self._zap_drives(devices=devices)
  File "/srv/deployment/wmcs-cookbooks/cookbooks/wmcs/ceph/osd/depool_and_destroy.py", line 305, in _zap_drives
    raise Exception("No devices found to zap, aborting")
Exception: No devices found to zap, aborting