Page MenuHomePhabricator

pontoon.traffic.eqiad1.wikimedia.cloud unable to run puppet agent due to certificate mismatch
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • ssh pontoon.traffic.eqiad1.wikimedia.cloud
  • sudo puppet agent -t

What happens?:

Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key. Did you forget to run as root?                                                               
Certificate fingerprint: 4A:51:D6:13:45:20:6B:1E:C2:EC:41:EA:AA:F8:69:F8:19:25:81:2D:62:87:0D:1B:FC:35:3D:27:95:6C:2F:28                                                                                             
To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certificate.                                                               
On the master:
  puppet cert clean pontoon.traffic.eqiad1.wikimedia.cloud
On the agent:
  1a. On most platforms: find /var/lib/puppet/ssl -name pontoon.traffic.eqiad1.wikimedia.cloud.pem -delete
  1b. On Windows: del "\var\lib\puppet\ssl\certs\pontoon.traffic.eqiad1.wikimedia.cloud.pem" /f
  2. puppet agent -t

What should have happened instead?:

  1. A clean agent run, or
  1. Certificate regeneration when running the advised commands

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Debian Buster, VM launched 2021-11-16

Event Timeline

BCornwall triaged this task as Low priority.

I took a look at the puppet master at pontoon.traffic.eqiad1.wikimedia.cloud and got puppet to run, however now a self-signed error is showing up.

# run-puppet-agent 
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: puppet]
Info: Retrieving pluginfacts
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: puppet]
Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet:///pluginfacts: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: puppet]
...

I checked the other hosts in the stack such as cptext.traffic.eqiad1.wikimedia.cloud and they switched back to talking to puppetmaster.cloudinfra.wmflabs.org not the pontoon master.

At any rate, my recommendation is at this point to nuke this instance and cptext + cpupload, then decide whether you wish to bring back a fresh Pontoon traffic stack as per https://wikitech.wikimedia.org/wiki/Puppet/Pontoon depending on the team's needs. My recommendation is obviously to do so, and I'm happy to help!

@ssingh @KOfori Is there a need/desire to have these three instances around? If so, is there any objection to following the above and terminating/replacing the instances?

I also saw certificate errors pop up in a different project that uses a local puppetmaster. And we felt like we had not touched anything. Did not get to look yet but this seemed similar enoigh and I was already suspecting some change related to self-puppetmaster.

I spoke with @Vgutierrez and they would like to keep these instances around. I'll do a little more digging into fixing this, particularly since @Dzahn's comment suggests that this is an issue that will pop up again.

Keeping the instances SGTM @BCornwall, thanks for looking into it. Personally I'd recommend starting afresh with a Pontoon stack (i.e. keep new instances around), feel free to reach out for assistance (with this or even reviving the instances you have now)

@Vgutierrez Indeed, do you have any reason to keep these *specific* instances around, or are you okay with a replacement?

@Vgutierrez Indeed, do you have any reason to keep these *specific* instances around, or are you okay with a replacement?

I'm happy with replacing the instances, feel free to proceed :)

The instances have been replaced.