Page MenuHomePhabricator

Cloud VPS puppet breakage on 2025-02-04 related to puppet-enc
Closed, ResolvedPublic

Description

We have a problem fleet-wide with:

aborrero@toolsbeta-test-k8s-worker-nfs-8:~$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Failed when searching for node toolsbeta-test-k8s-worker-nfs-8.toolsbeta.eqiad1.wikimedia.cloud: Exception while executing '/usr/local/bin/puppet-enc': Cannot run program "/usr/local/bin/puppet-enc" (in directory "."): error=0, Failed to exec spawn helper: pid: 578360, exit value: 1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Event Timeline

aborrero triaged this task as Medium priority.EditedFeb 4 2025, 9:17 AM
aborrero added a subscriber: taavi.

@taavi pointed out that this may be an unnattended upgrade of java:

Start-Date: 2025-02-04  06:26:11
Commandline: /usr/bin/unattended-upgrade
Upgrade: openjdk-17-jre-headless:amd64 (17.0.13+11-2~deb12u1, 17.0.14+7-1~deb12u1)
End-Date: 2025-02-04  06:26:15

Which seems to be correct. A restart of the puppet server fixes the problem, example:

aborrero@toolsbeta-puppetserver-1:~$ sudo systemctl restart puppetserver.service

Started to receive emails for wmcz-stats about this:

urbanecm@wmcz-stats-wikinside01:~$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Failed when searching for node wmcz-stats-wikinside01.wmcz-stats.eqiad1.wikimedia.cloud: Exception while executing '/usr/local/bin/puppet-enc': Cannot run program "/usr/local/bin/puppet-enc" (in directory "."): error=0, Failed to exec spawn helper: pid: 1366681, exit value: 1
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
urbanecm@wmcz-stats-wikinside01:~$

Mentioned in SAL (#wikimedia-cloud) [2025-02-04T09:22:41Z] <arturo> fleet-wide restart of puppetservers T385553

restarting puppetservers fleetwide with:

aborrero@cloudcumin1001:~ $ sudo cumin -x --force O{*} 'systemctl list-units --all | grep -q puppetserver.service && systemctl try-restart puppetserver.service || echo "no puppetserver service"'

Change #1117122 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] profile::puppetserver::wmcs: refresh apt pin for openjdk

https://gerrit.wikimedia.org/r/1117122

Change #1117122 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] profile::puppetserver::wmcs: refresh apt pin for openjdk

https://gerrit.wikimedia.org/r/1117122

aborrero claimed this task.