Page MenuHomePhabricator

New Cloud VPS instance has "Failed to start Execute cloud user/final script." in its error log
Closed, InvalidPublicBUG REPORT

Description

After bringing up a new deployment-prep instance (deployment-rdb01), I see

[[0;1;31mFAILED[0m] Failed to start [0;1;39mExecute cloud user/final scripts[0m.
See 'systemctl status cloud-final.service' for details.

in the Horizon log.

systemctl status gives

● cloud-final.service - Execute cloud user/final scripts
     Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2023-10-26 18:32:53 UTC; 23h ago
   Main PID: 1554 (code=exited, status=2)
        CPU: 40.416s

Oct 26 18:32:53 deployment-rdb01 ec2[9978]: 256 SHA256:pS5NAT1Qg3cY7xZXxdvlIV/uIQ/9ltafSD4DxvLVQWk root@deployment-rdb01 (ECDSA)
Oct 26 18:32:53 deployment-rdb01 ec2[9978]: 256 SHA256:Rihm3mIDX2VJdgq8XM75L56pjBGsA5O/sUPZvSasHS4 root@deployment-rdb01 (ED25519)
Oct 26 18:32:53 deployment-rdb01 ec2[9978]: 3072 SHA256:avCIM37rkj/Z8dQ5pue+kIWUZXWQiASEy1LzNR83CMs root@deployment-rdb01 (RSA)
Oct 26 18:32:53 deployment-rdb01 ec2[9978]: -----END SSH HOST KEY FINGERPRINTS-----
Oct 26 18:32:53 deployment-rdb01 ec2[9978]: #############################################################
Oct 26 18:32:53 deployment-rdb01 cloud-init[1558]: Cloud-init v. 20.4.1 finished at Thu, 26 Oct 2023 18:32:53 +0000. Datasource DataSourceOpenStackLocal [net,ver=2].  Up 206.83 seconds
Oct 26 18:32:53 deployment-rdb01 systemd[1]: cloud-final.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 26 18:32:53 deployment-rdb01 systemd[1]: cloud-final.service: Failed with result 'exit-code'.
Oct 26 18:32:53 deployment-rdb01 systemd[1]: Failed to start Execute cloud user/final scripts.
Oct 26 18:32:53 deployment-rdb01 systemd[1]: cloud-final.service: Consumed 40.416s CPU time.

sudo puppet agent -tv gives a bunch of errors:

2023-10-27 18:35:01.195393 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Info: Retrieving pluginfacts
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet:///pluginfacts: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Info: Retrieving plugin
Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Error: /File[/var/lib/puppet/lib]: Could not evaluate: Could not retrieve file metadata for puppet:///plugins: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Info: Loading facts
Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs]

Event Timeline

I also got this error as part of the login message, but it seems unrelated:

-bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
_____________________________________________________________________
WARNING! Your environment specifies an invalid locale.
 The unknown environment variables are:
   LC_CTYPE=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_ALL=en_US.UTF-8
 This can affect your user experience significantly, including the
 ability to manage packages. You may install the locales by running:

 sudo dpkg-reconfigure locales

 and select the missing language. Alternatively, you can install the
 locales-all package:

 sudo apt-get install locales-all

To disable this message for all users, run:
   sudo touch /var/lib/cloud/instance/locale-check.skip
_____________________________________________________________________

Okay so I found https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster#Step_2:_Setup_a_puppet_client and followed the instructions:

rdb01$ sudo rm -rf /var/lib/puppet/ssl
rdb01$ sudo -i puppet agent -tv
puppetmaster$ sudo -i puppet cert list
puppetmaster$ sudo -i puppet cert sign deployment-rdb01.deployment-prep.eqiad1.wikimedia.cloud
rdb01$ sudo -i puppet agent -tv

and that fixed it. It would be nice to get an error pointing to the standalone puppetmaster docs if the puppetmaster cannot be reached.

Mentioned in SAL (#wikimedia-cloud) [2023-11-21T21:41:52Z] <mutante> - cert issue on new machine related to having local puppetmaster, like T349937#9288547 except "rm -rf /var/lib/puppet/ssl" was enough since puppetmaster did auto-sign new CSR - T327068