While I was decommissioning Elastic hosts in T358882 , I made an operational error which caused one of our Elastic clusters to lose quorum (see this incident) .
I realized the problem was related to the decommission, so I wanted to stop the cookbook ASAP. The cookbook was sitting at a prompt which read Type "go" to proceed or "abort" to interrupt the execution. I tried to halt the cookbook by typing 'abort'. But instead of stopping, the cookbook skipped to the next step and wiped the filesystem on one of the master hosts, which significantly complicated the recovery effort.
Creating this ticket to request that the sre.hosts.decommission cookbook completely aborts when the user inputs "abort".
I've captured the decommission logs and my tmux buffer in /home/bking/decom/ on cumin2002. Line 1977 on the tmux buffer shows my "abort" input.
Thanks for taking a look, please let me know if you need more info.