Page MenuHomePhabricator

Try to reverse wipefs on host using DRAC/iLO and document
Closed, DeclinedPublic

Description

I accidentally decommissioned an important host in parent ticket. @BTullis correctly pointed out that the decom cookbook only wipes the first few bytes of the disks . Thus, it should be possible to reverse this operation. Creating this ticket to:

  • Call the decom cookbook on an unused host (most likely wdqs2025)
  • Attempt to restore the filesystem from OOB management console (DRAC/iLO)
  • Document the results.

Event Timeline

This task might take a long time to achieve, for something that might seldom (if ever) be used again.
There are also a lot of variables between hosts which might come into play, such as the different iDRAC firmware capabilities. e.g. attaching virtual media.

Perhaps a more useful outcome might be a cookbook (or modification to an existing cookbook) that would allow us to boot a generic debian kernel image into rescue mode on a host that is not present in netbox.

We briefly tried running the sre.hosts.dhcp cookbook to boot the accidentally decommissioned host with PXE.
However, this failed quickly because there was no IP address in netbox for the host to use in its DHCP fragment.

Maybe there is some way that we could use the Netbox Provision Server Network script or something similar to generate an ephemeral configuration that is suitable for booting a host into rescue mode.

Whatever we do, I suggest that we seek input from the Infrastructure-Foundations team before going any further.

Gehel moved this task from Incoming to Toil / Automation on the Data-Platform-SRE board.

Agreed on all @BTullis ' points above. While chaos engineering is a lot of fun, this one doesn't seem worth the effort. Closing...