I tried to SSH to integration-agent-docker-1029 but it didn't work. So I added myself to the root authorized_keys and investigated.
getent passwd tstarling is empty.
2022-12-04 16:26:46: start time of sssd_be
/var/log/sssd/sssd_nss.log.1:
(2022-12-04 16:26:47): [nss] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_wikimedia.org [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_wikimedia.org: Connection refused (2022-12-04 16:26:48): [nss] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus (2022-12-04 16:26:51): [nss] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_wikimedia.org [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_wikimedia.org: Connection refused (2022-12-04 16:26:51): [nss] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus (2022-12-04 16:27:01): [nss] [sbus_dbus_connect_address] (0x0020): Unable to connect to unix:path=/var/lib/sss/pipes/private/sbus-dp_wikimedia.org [org.freedesktop.DBus.Error.NoServer]: Failed to connect to socket /var/lib/sss/pipes/private/sbus-dp_wikimedia.org: Connection refused (2022-12-04 16:27:02): [nss] [sbus_reconnect_attempt] (0x0020): Unable to connect to D-Bus (2022-12-04 16:27:02): [nss] [sbus_reconnect] (0x0020): Unable to reconnect: maximum retries exceeded. (2022-12-04 16:27:02): [nss] [sss_dp_on_reconnect] (0x0010): Could not reconnect to wikimedia.org provider. (2022-12-04 16:30:53): [nss] [cache_req_common_process_dp_reply] (0x0040): CR #205723: Could not get account info [1432158212]: SSSD is offline
The last message, "SSSD is offline", is repeated up to the present (2022-12-12 6:06:00).
The backend log, sssd_wikimedia.org.log.1, is quiet at the time of the initial failure, until:
(2022-12-04 16:30:51): [be[wikimedia.org]] [server_setup] (0x0040): Starting with debug level = 0x0070
syslog:
Dec 4 16:30:51 integration-agent-docker-1029 kernel: [3544302.900910] docker invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 ... Dec 4 16:30:51 integration-agent-docker-1029 kernel: [3544302.901530] Out of memory: Killed process 3715275 (php) total-vm:23878800kB, anon-rss:22873048kB, file-rss:0kB, shmem-rss:0kB, UID:65534 pgtables:46708kB oom_score_adj:0 ... Dec 4 16:30:51 integration-agent-docker-1029 sssd[613]: Child [664] ('wikimedia.org':'%BE_wikimedia.org') was terminated by own WATCHDOG. Consult corresponding logs to figure out the reason.
I restarted sssd, which fixed the problem.
I suggest monitoring for this kind of sssd failure and/or submitting a patch upstream.