This task is to check why it was not updated by acme-chief and to fix it.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
acme_chief: add log_level to the config | operations/software/acme-chief | master | +31 -23 |
Related Objects
Event Timeline
I see these errors in the logs of root@tools-acme-chief-01:
root@tools-acme-chief-01:/etc/acme-chief# systemctl status acme-chief-certs-sync ● acme-chief-certs-sync.service - Sync acme-chief certificates Loaded: loaded (/lib/systemd/system/acme-chief-certs-sync.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2022-05-02 08:32:15 UTC; 2min 0s ago Process: 12024 ExecStart=/usr/local/bin/acme-chief-certs-sync (code=exited, status=255/EXCEPTION) Main PID: 12024 (code=exited, status=255/EXCEPTION) May 02 08:32:14 tools-acme-chief-01 systemd[1]: Started Sync acme-chief certificates. May 02 08:32:14 tools-acme-chief-01 acme-chief-certs-sync[12024]: Could not create directory '/nonexistent/.ssh'. May 02 08:32:15 tools-acme-chief-01 acme-chief-certs-sync[12024]: Connection closed by 172.16.0.18 port 22 May 02 08:32:15 tools-acme-chief-01 systemd[1]: acme-chief-certs-sync.service: Main process exited, code=exited, status=255/EXCEPTION May 02 08:32:15 tools-acme-chief-01 acme-chief-certs-sync[12024]: rsync: connection unexpectedly closed (0 bytes received so far) [sender] May 02 08:32:15 tools-acme-chief-01 acme-chief-certs-sync[12024]: rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.3] May 02 08:32:15 tools-acme-chief-01 systemd[1]: acme-chief-certs-sync.service: Failed with result 'exit-code'.
Checking the service file, it's using a user that exists but has no home:
root@tools-acme-chief-01:/etc/acme-chief# cat /lib/systemd/system/acme-chief-certs-sync.service; [Unit] Description=Sync acme-chief certificates [Service] User=acme-chief ExecStart=/usr/local/bin/acme-chief-certs-sync root@tools-acme-chief-01:/etc/acme-chief# id acme-chief uid=497(acme-chief) gid=497(acme-chief) groups=497(acme-chief) root@tools-acme-chief-01:/etc/acme-chief# cd ~acme-chief -bash: cd: /nonexistent: No such file or directory
looking
Mentioned in SAL (#wikimedia-cloud) [2022-05-02T08:54:11Z] <taavi> restart acme-chief.service T307333
Restarting the acme-chief service and re-running puppet on toolserver-proxy-01 did the trick, prabably related to T273956.
Acme-chief logs before restarting:
taavi@tools-acme-chief-01:~ $ sudo journalctl -u acme-chief.service -- Logs begin at Sun 2022-05-01 17:27:25 UTC, end at Mon 2022-05-02 08:52:38 UTC. -- May 01 19:59:59 tools-acme-chief-01 acme-chief-backend[18675]: Refreshing live OCSP response for certificate toolse May 01 19:59:59 tools-acme-chief-01 acme-chief-backend[18675]: live OCSP response refreshed successfully for toolse May 01 19:59:59 tools-acme-chief-01 acme-chief-backend[18675]: Refreshing live OCSP response for certificate toolse May 01 19:59:59 tools-acme-chief-01 acme-chief-backend[18675]: live OCSP response refreshed successfully for toolse
Change 788294 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/software/acme-chief@master] acme_chief: add log_level to the config
for some reason reload-acme-chief-backend.timer isn't being triggered on tools-acme-chief-01. that's not related to T273956
Change 788294 abandoned by David Caro:
[operations/software/acme-chief@master] acme_chief: add log_level to the config
Reason:
Not needed