Page MenuHomePhabricator

Ensure/confirm a way to shell into unpuppetized VMs
Closed, ResolvedPublic

Description

I am pretty sure that if a VM fails its initial puppet run that it never gets around to enabling the root shell. We need a way to shell into VMs that have never run puppet properly in order to e.g. create an initial puppetmaster.

Event Timeline

Andrew created this task.Mon, May 20, 4:00 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMon, May 20, 4:00 PM
Andrew claimed this task.Mon, May 20, 4:00 PM

(it may be worth noting that while this is not directly necessary for the migration in the parent ticket, it is important to maintain the ability to bootstrap a realm that has no puppetmasters - either for a brand new realm or in the event of disaster wiping out all the existing puppetmasters. previously this was done by just making a new production host that the realm is allowed access to, with a move away from the model of puppetmasters for other realms sitting in production this becomes important)

Andrew added a comment.EditedFri, May 24, 8:50 PM

The good news is that once the firstboot script exits we should be able to get a local console. The bad news is that if the initial cert sign doesn't work the firstboot script may NEVER exit:

# puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff --waitforcert=1 --certname=consoletest-01.testlabs.eqiad.wmflabs --server=thisisnotarealpuppetmaster
Info: Creating a new SSL key for consoletest-01.testlabs.eqiad.wmflabs
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
Error: Could not request certificate: Failed to open TCP connection to thisisnotarealpuppetmaster:8140 (getaddrinfo: Name or service not known)
<etc. literally forever>

If I remove the --waitforcert entirely then it exits. So I could replace --waitforcert with some kind of explicit loop wrapping the puppet agent call.

Change 512441 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud image firstboot: don't --waitforcert on first puppet run

https://gerrit.wikimedia.org/r/512441

Change 512441 merged by Andrew Bogott:
[operations/puppet@production] cloud image firstboot: don't --waitforcert on first puppet run

https://gerrit.wikimedia.org/r/512441

Andrew closed this task as Resolved.Fri, May 24, 9:49 PM

With the attached patch in place, a new VM with no valid puppetmaster will flounder for a bit but then boot up such that a local virsh console can be attached. That's enough to allow us to bootstrap an initial puppetmaster.