Page MenuHomePhabricator

slapd fails to restart sometimes
Open, MediumPublic

Description

A restart of slapd on serpens was requested in response to certificate rotation on 2020-12-03 14:20:12.

Dec 03 14:20:42 serpens systemd[1]: slapd.service: Found left-over process 6300 (slapd) in control group while starting unit. Ignoring.

The process hung around for 7 hours following the SIGTERM and remained stuck until ops responded to alerting. The remedy was to issue a SIGKILL and restart slapd manually.

This behavior and remedy seems familiar to the wikitech outage caused by LDAP on seaborgium a few weeks ago.

Event Timeline

jbond triaged this task as Medium priority.Dec 9 2020, 12:10 PM
jbond subscribed.

adding additional logs before they get rotated

Dec  3 14:20:32 serpens puppet-agent[4040]: Computing checksum on file /etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25ec/ec-prime256v1.ocsp
Dec  3 14:20:32 serpens puppet-agent[4040]: (/Stage[main]/Profile::Openldap/Acme_chief::Cert[ldap]/File[/etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25e
c/ec-prime256v1.ocsp]) Filebucketed /etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25ec/ec-prime256v1.ocsp to puppet with sum aceeec69f2775e228c10c7582740
a536
Dec  3 14:20:32 serpens puppet-agent[4040]: (/Stage[main]/Profile::Openldap/Acme_chief::Cert[ldap]/File[/etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25e
c/ec-prime256v1.ocsp]/content) content changed '{md5}aceeec69f2775e228c10c7582740a536' to '{md5}7fd7bc9b9587dfdeecdb4f8d82b598e1'
Dec  3 14:20:32 serpens puppet-agent[4040]: Computing checksum on file /etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25ec/rsa-2048.ocsp
Dec  3 14:20:32 serpens puppet-agent[4040]: (/Stage[main]/Profile::Openldap/Acme_chief::Cert[ldap]/File[/etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25e
c/rsa-2048.ocsp]) Filebucketed /etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25ec/rsa-2048.ocsp to puppet with sum c3a35b52ee94e5f1e0c605686160b306
Dec  3 14:20:32 serpens puppet-agent[4040]: (/Stage[main]/Profile::Openldap/Acme_chief::Cert[ldap]/File[/etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25e
c/rsa-2048.ocsp]/content) content changed '{md5}c3a35b52ee94e5f1e0c605686160b306' to '{md5}1c8fbc524abe6d9a67db1bd94368f120'
Dec  3 14:20:32 serpens puppet-agent[4040]: (/etc/acmecerts/ldap) Scheduling refresh of Service[slapd]
Dec  3 14:20:32 serpens systemd[1]: Stopping LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)...
Dec  3 14:20:32 serpens slapd[6300]: daemon: shutdown requested and initiated.
Dec  3 14:20:32 serpens slapd[6300]: slapd shutdown: waiting for 6 operations/tasks to finish
Dec  3 14:20:42 serpens slapd[4363]: Stopping OpenLDAP: slapd failed!
Dec  3 14:20:42 serpens systemd[1]: slapd.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Dec  3 14:20:42 serpens systemd[1]: slapd.service: Failed with result 'exit-code'.
Dec  3 14:20:42 serpens systemd[1]: Stopped LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).
Dec  3 14:20:42 serpens systemd[1]: slapd.service: Found left-over process 6300 (slapd) in control group while starting unit. Ignoring.
Dec  3 14:20:42 serpens systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Dec  3 14:20:42 serpens systemd[1]: Starting LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)...
Dec  3 14:20:42 serpens slapd[4381]: Starting OpenLDAP: slapd.
Dec  3 14:20:42 serpens systemd[1]: Started LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).
Dec  3 14:20:42 serpens puppet-agent[4040]: (/Stage[main]/Openldap/Service[slapd]) Triggered 'refresh' from 1 event