To address my anxiety about T207536, I'm thinking about ways we can step up our emergency options when a VM becomes unreachable (e.g. if puppet and cumin misbehave both at once and every ssh key is scrambled etc. etc.)
I have one small idea and one big idea.
Small idea: Install guestfish and libguestfs on all cloudvirts. That will make reading and modifying the file system of busted VMs much easier. My only reservation about this is that it's a long dependency chain and when I installed it just now on cloudvirt1015 it prompted me about restarting the system disk array which seems... weird? I told it not to and all seems well but it put me on edge.
The big (but obvious) idea is: Have puppet install a single, shared root password on every VM, and store that password in pwstore. Then figure out how to launch a console from the appropriate cloudvirt. This is a much-scaled-down version of my previous support-remote-web-shells attempt; the advantage of scaling it down is it should be easier to understand the security implications. And, we use a global root password for prod already so how could this be worse?