Page MenuHomePhabricator

prometheus-openstack-stale-puppet-certs crashing on deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud
Closed, ResolvedPublicBUG REPORT

Description

root@deployment-puppetserver-1:~# sudo journalctl -u prometheus_openstack_stale_puppet_certs --no-pager --since=yesterday
...
Jan 06 01:23:27 deployment-puppetserver-1 systemd[1]: Starting prometheus_openstack_stale_puppet_certs.service - Regular job to collect information about stale Puppet certificates...
Jan 06 01:23:35 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 13.824384773954527 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-c6a7bc95-6ef5-49d4-b6f7-6c5c9fda8da2).
Jan 06 01:23:48 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 9.798903876948248 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-59249383-fd9d-44bf-b2c7-b6ab33605a85).
Jan 06 01:23:58 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 13.937448053308282 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-998e6781-63a8-4a0a-beb2-6f6bd173803b).
Jan 06 01:24:12 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 8.033557668942898 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-df56ab97-e800-4c20-ac37-d6bb933a9176).
Jan 06 01:24:21 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 9.317775458884913 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-0eb8707e-7f0d-44a1-bc63-d2c5b38e6601).
Jan 06 01:24:30 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 13.748912841153267 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-56fca66b-484f-4c1c-94c8-435b17799471).
Jan 06 01:24:44 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 10.728456782034158 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-0dfc6bf2-ab0d-4136-9d1e-10408ec7751a).
Jan 06 01:24:55 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Retrying mwopenstackclients.Clients.novaclient in 10.395034838579656 seconds as it raised NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-46541bbb-f15d-4b29-b030-562d2731748b).
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]: Traceback (most recent call last):
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]:   File "/usr/local/sbin/prometheus-openstack-stale-puppet-certs", line 91, in <module>
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]:     sys.exit(main())
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]:              ^^^^^^
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]:   File "/usr/local/sbin/prometheus-openstack-stale-puppet-certs", line 82, in main
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[276352]:     collect_openstack_cert_data(registry, Path(ssl_dir()))
Jan 06 01:25:05 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     p = self._get(url + query, self.key)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:   File "/usr/lib/python3/dist-packages/keystoneclient/base.py", line 166, in _get
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     resp, body = self.client.get(url, **kwargs)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:   File "/usr/lib/python3/dist-packages/keystoneauth1/adapter.py", line 393, in get
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     return self.request(url, 'GET', **kwargs)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:   File "/usr/lib/python3/dist-packages/keystoneauth1/adapter.py", line 552, in request
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:   File "/usr/lib/python3/dist-packages/keystoneauth1/adapter.py", line 255, in request
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     return self.session.request(url, method, **kwargs)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:   File "/usr/lib/python3/dist-packages/keystoneauth1/session.py", line 985, in request
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]:     raise exceptions.from_response(resp, method, url)
Jan 07 18:42:01 deployment-puppetserver-1 prometheus-openstack-stale-puppet-certs[304962]: keystoneauth1.exceptions.http.NotFound: Could not find project: maps-experiments. (HTTP 404) (Request-ID: req-12f1395c-b64c-4d51-b41e-565912127a17)
Jan 07 18:42:01 deployment-puppetserver-1 systemd[1]: prometheus_openstack_stale_puppet_certs.service: Main process exited, code=exited, status=1/FAILURE
Jan 07 18:42:01 deployment-puppetserver-1 systemd[1]: prometheus_openstack_stale_puppet_certs.service: Failed with result 'exit-code'.
Jan 07 18:42:01 deployment-puppetserver-1 systemd[1]: Failed to start prometheus_openstack_stale_puppet_certs.service - Regular job to collect information about stale Puppet certificates.
Jan 07 18:42:01 deployment-puppetserver-1 systemd[1]: prometheus_openstack_stale_puppet_certs.service: Consumed 3.557s CPU time.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2025-01-07T18:50:34Z] <taavi> taavi@deployment-puppetserver-1:~$ sudo puppet node clean geoshapes.maps-experiments.eqiad1.wikimedia.cloud # T383153

Mentioned in SAL (#wikimedia-cloud) [2025-01-07T18:53:45Z] <taavi> taavi@deployment-puppetserver-1:~$ sudo puppetserver ca clean --certname maps-master01.maps-experiments.eqiad1.wikimedia.cloud # T383153

bd808 added a subscriber: taavi.

After @taavi cleaned the certs causing the script to crash:

root@deployment-puppetserver-1:~# /usr/local/sbin/clean-stale-puppet-certs
stray cert deployment-db13.deployment-prep.eqiad1.wikimedia.cloud
stray cert test-restbase-upgrade.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-eventlog08.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-sessionstore05.deployment-prep.eqiad1.wikimedia.cloud
stray cert test-restbase-bullseye.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-poolcounter06.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-echostore02.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-db12.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-push-notifications01.deployment-prep.eqiad.wmflabs
stray cert deployment-docker-cpjobqueue01.deployment-prep.eqiad.wmflabs
stray cert deployment-maps-master01.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-maps-new02.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-mediawiki81.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-ircd02.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-cumin.deployment-prep.eqiad.wmflabs
stray cert deployment-docker-mobileapps01.deployment-prep.eqiad1.wikimedia.cloud
stray cert deployment-docker-proton01.deployment-prep.eqiad.wmflabs

A run of /usr/local/sbin/clean-stale-puppet-certs --clean that is happening now will clean those up.

Mentioned in SAL (#wikimedia-cloud) [2025-01-07T19:00:49Z] <bd808> /usr/local/sbin/clean-stale-puppet-certs --clean (T383153)