Decommissioning two hosts end up with: Failed to wipe swraid
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Marostegui
	Jun 29 2022, 8:30 AM

Description

For T311591: decommission db2075 and T311589: decommission db2071 I ended up with the following error (for both hosts):

----- OUTPUT of 'lsblk --all --ou...-all --force %*'' -----
/dev/sda: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sda: 8 bytes were erased at offset 0x3a2c7fffe00 (gpt): 45 46 49 20 50 41 52 54
/dev/sda: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sda1: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef
wipefs: /dev/sda2: failed to erase swap magic string at offset 0x00000ff6: Text file busy
================
PASS |                                                                                                 |   0% (0/1) [00:00<?, ?hosts/s]
FAIL |█████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.55hosts/s]
100.0% (1/1) of nodes failed to execute command 'lsblk --all --ou...-all --force %*'': db2075.codfw.wmnet
0.0% (0/1) success ratio (< 100.0% threshold) for command: 'lsblk --all --ou...-all --force %*''. Aborting.
0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
**Failed to wipe swraid, partition-table and filesystem signatures, manual intervention required to make it unbootable**: Cumin execution failed (exit_code=2)

The rest of the decom process went fine.
Both hosts were up and running normally before the decommissioning.

Details

	Subject	Repo	Branch	Lines +/-
	Disable swap before running wipefs	operations/cookbooks	master	+1 -0

Customize query in gerrit

Related Objects

Mentioned In: rCCKB3d25d43bbd4a: Disable swap before running wipefs
Mentioned Here: T311589: decommission db2071
T311591: decommission db2075

Event Timeline

• Marostegui created this task.Jun 29 2022, 8:30 AM

Restricted Application added a project: Infrastructure-Foundations. · View Herald TranscriptJun 29 2022, 8:30 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Maybe we need run "swapoff -a" prior to the wipefs call?

I have a few more hosts to decommission. I can try to do so, but we'd not know whether it helped or it would have just worked without it too :)
Up to you!

In T311593#8036155, @Marostegui wrote:

I have a few more hosts to decommission. I can try to do so, but we'd not know whether it helped or it would have just worked without it too :)

It's more of a shot in the dark/hunch based on the fact that the device which fails is a swap partition. It could still be a useful hint, so if you could manually run "swapoff -a" before the next decom, that would be great.

Wilco!

I ran swapoff -a on db2081 and it went fine. Could be coincidence or it could've been the fix. Hard to know.
However, I guess it doesn't hurt to include it as part of the normal decommissioning process anyways if that doesn't require much work. Up to you!

Change 809599 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Disable swap before running wipefs

https://gerrit.wikimedia.org/r/809599

gerritbot added a project: Patch-For-Review.Jun 29 2022, 1:26 PM

RhinosF1 subscribed.Jun 29 2022, 2:29 PM

Change 809599 merged by Muehlenhoff:

[operations/cookbooks@master] Disable swap before running wipefs

https://gerrit.wikimedia.org/r/809599

Maintenance_bot removed a project: Patch-For-Review.Jun 30 2022, 12:30 PM

@Marostegui Did this happen again for any reimage after I merged by patch above?

Nope, it all went fine! Good to close. I need to decom a lot more in the upcoming days, will reopen if needed.
Thanks for fixing it!

Ack, closing then :-)

Muehlenhoff mentioned this in rCCKB3d25d43bbd4a: Disable swap before running wipefs.Dec 14 2022, 3:30 PM

Decommissioning two hosts end up with: Failed to wipe swraidClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Decommissioning two hosts end up with: Failed to wipe swraid
Closed, ResolvedPublic
Actions