Page MenuHomePhabricator

Puppet agent failure detected on instance deployment-shellbox01 in project deployment-prep
Closed, ResolvedPublic

Description

Common information

  • summary: Puppet agent failure detected on instance deployment-shellbox01 in project deployment-prep
  • alertname: PuppetAgentFailure
  • instance: deployment-shellbox01
  • job: node
  • project: deployment-prep
  • severity: warning

Firing alerts


  • summary: Puppet agent failure detected on instance deployment-shellbox01 in project deployment-prep
  • alertname: PuppetAgentFailure
  • instance: deployment-shellbox01
  • job: node
  • project: deployment-prep
  • severity: warning
  • Source

Event Timeline

bd808 subscribed.
bd808@deployment-shellbox01:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-shellbox01.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(eaf9c7d5b5) gitpuppet - MediaWiki: Only proxy existing .php files, otherwise return nice 404'
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]: Unscheduling refresh on Service[shellbox]
Notice: Applied catalog in 8.90 seconds
bd808@deployment-shellbox01:~$ echo $?
2
bd808@deployment-shellbox01:~$ bd808@deployment-shellbox01:~$ sudo systemctl status shellbox --no-pager
● shellbox.service - Systemd runner for shellbox
     Loaded: loaded (/lib/systemd/system/shellbox.service; enabled; preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Mon 2026-03-09 21:23:59 UTC; 2s ago
    Process: 3326455 ExecStartPre=/usr/bin/docker stop shellbox.service (code=exited, status=1/FAILURE)
    Process: 3326461 ExecStartPre=/usr/bin/docker rm shellbox.service (code=exited, status=1/FAILURE)
    Process: 3326467 ExecStart=/usr/bin/docker run --rm=true --env-file /etc/shellbox/env -p 8081:8081 -v shellbox:/etc/shellbox -v /run/shared:/run/shared -v /srv/shellbox/config/:/srv/app/config -v /srv/shellbox/src:/srv/app/src --name shellbox.service docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video --nodaemonize (code=exited, status=78)
   Main PID: 3326467 (code=exited, status=78)
        CPU: 123ms
bd808@deployment-shellbox01:~$ sudo journalctl -u shellbox --no-pager --since "1 minute ago"
...
Mar 09 21:36:51 deployment-shellbox01 systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2905376.
Mar 09 21:36:51 deployment-shellbox01 systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 21:36:51 deployment-shellbox01 systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 21:36:51 deployment-shellbox01 docker-shellbox[3334759]: Error response from daemon: No such container: shellbox.service
Mar 09 21:36:51 deployment-shellbox01 docker-shellbox[3334765]: Error: No such container: shellbox.service
Mar 09 21:36:51 deployment-shellbox01 systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 21:36:52 deployment-shellbox01 docker-shellbox[3334771]: [09-Mar-2026 21:36:52] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 21:36:52 deployment-shellbox01 docker-shellbox[3334771]: [09-Mar-2026 21:36:52] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 21:36:52 deployment-shellbox01 docker-shellbox[3334771]: [09-Mar-2026 21:36:52] ERROR: FPM initialization failed
Mar 09 21:36:52 deployment-shellbox01 docker-shellbox[3334771]: [09-Mar-2026 21:36:52] ERROR: FPM initialization failed
Mar 09 21:36:52 deployment-shellbox01 systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 21:36:52 deployment-shellbox01 systemd[1]: shellbox.service: Failed with result 'exit-code'.

The fact that things are trying to run docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video seems like part of the problem here, but the nearer issue is:

ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)

Mentioned in SAL (#wikimedia-releng) [2026-03-09T21:53:28Z] <bd808> Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away (T419440)

The reboot hope failed. I'm not seeing obvious changes in ops/puppet.git in the last 5-7 hours that look like they would have changed things about how this instance is provisioned, but I may not be doing a great job of searching yet.

I'm not sure why there isn't a ticket for deployment-shellbox-video, but it seems to have the same problem.

bd808@deployment-shellbox-video:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-shellbox-video.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(b70cd49b21) gitpuppet - MediaWiki: Only proxy existing .php files, otherwise return nice 404'
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]: Unscheduling refresh on Service[shellbox]
Notice: Applied catalog in 17.66 seconds
bd808@deployment-shellbox-video:~$ echo $?
2
bd808@deployment-shellbox-video:~$ sudo systemctl status shellbox --no-pager
● shellbox.service - Systemd runner for shellbox
     Loaded: loaded (/lib/systemd/system/shellbox.service; enabled; preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Mon 2026-03-09 23:21:47 UTC; 3s ago
    Process: 2445148 ExecStartPre=/usr/bin/docker stop shellbox.service (code=exited, status=1/FAILURE)
    Process: 2445156 ExecStartPre=/usr/bin/docker rm shellbox.service (code=exited, status=1/FAILURE)
    Process: 2445164 ExecStart=/usr/bin/docker run --rm=true --env-file /etc/shellbox/env -p 8081:8081 -v shellbox:/etc/shellbox -v /run/shared:/run/shared -v /srv/shellbox/config/:/srv/app/config -v /srv/shellbox/src:/srv/app/src --name shellbox.service docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video --nodaemonize (code=exited, status=78)
   Main PID: 2445164 (code=exited, status=78)
        CPU: 195ms
bd808@deployment-shellbox-video:~$ sudo journalctl -u shellbox --no-pager --since "1 minute ago"
Mar 09 23:21:00 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:00 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.
Mar 09 23:21:10 deployment-shellbox-video systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2863361.
Mar 09 23:21:10 deployment-shellbox-video systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:10 deployment-shellbox-video systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 23:21:10 deployment-shellbox-video docker-shellbox[2444513]: Error response from daemon: No such container: shellbox.service
Mar 09 23:21:10 deployment-shellbox-video docker-shellbox[2444522]: Error: No such container: shellbox.service
Mar 09 23:21:10 deployment-shellbox-video systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:11 deployment-shellbox-video docker-shellbox[2444530]: [09-Mar-2026 23:21:11] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:11 deployment-shellbox-video docker-shellbox[2444530]: [09-Mar-2026 23:21:11] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:11 deployment-shellbox-video docker-shellbox[2444530]: [09-Mar-2026 23:21:11] ERROR: FPM initialization failed
Mar 09 23:21:11 deployment-shellbox-video docker-shellbox[2444530]: [09-Mar-2026 23:21:11] ERROR: FPM initialization failed
Mar 09 23:21:11 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:11 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.
Mar 09 23:21:21 deployment-shellbox-video systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2863362.
Mar 09 23:21:21 deployment-shellbox-video systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:21 deployment-shellbox-video systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 23:21:21 deployment-shellbox-video docker-shellbox[2444728]: Error response from daemon: No such container: shellbox.service
Mar 09 23:21:22 deployment-shellbox-video docker-shellbox[2444735]: Error: No such container: shellbox.service
Mar 09 23:21:22 deployment-shellbox-video systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:22 deployment-shellbox-video docker-shellbox[2444742]: [09-Mar-2026 23:21:22] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:22 deployment-shellbox-video docker-shellbox[2444742]: [09-Mar-2026 23:21:22] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:22 deployment-shellbox-video docker-shellbox[2444742]: [09-Mar-2026 23:21:22] ERROR: FPM initialization failed
Mar 09 23:21:22 deployment-shellbox-video docker-shellbox[2444742]: [09-Mar-2026 23:21:22] ERROR: FPM initialization failed
Mar 09 23:21:23 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:23 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.
Mar 09 23:21:33 deployment-shellbox-video systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2863363.
Mar 09 23:21:33 deployment-shellbox-video systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:33 deployment-shellbox-video systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 23:21:33 deployment-shellbox-video docker-shellbox[2444991]: Error response from daemon: No such container: shellbox.service
Mar 09 23:21:33 deployment-shellbox-video docker-shellbox[2444999]: Error: No such container: shellbox.service
Mar 09 23:21:33 deployment-shellbox-video systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:34 deployment-shellbox-video docker-shellbox[2445007]: [09-Mar-2026 23:21:34] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:34 deployment-shellbox-video docker-shellbox[2445007]: [09-Mar-2026 23:21:34] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:34 deployment-shellbox-video docker-shellbox[2445007]: [09-Mar-2026 23:21:34] ERROR: FPM initialization failed
Mar 09 23:21:34 deployment-shellbox-video docker-shellbox[2445007]: [09-Mar-2026 23:21:34] ERROR: FPM initialization failed
Mar 09 23:21:35 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:35 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.
Mar 09 23:21:45 deployment-shellbox-video systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2863364.
Mar 09 23:21:45 deployment-shellbox-video systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:45 deployment-shellbox-video systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 23:21:45 deployment-shellbox-video docker-shellbox[2445148]: Error response from daemon: No such container: shellbox.service
Mar 09 23:21:45 deployment-shellbox-video docker-shellbox[2445156]: Error: No such container: shellbox.service
Mar 09 23:21:45 deployment-shellbox-video systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:46 deployment-shellbox-video docker-shellbox[2445164]: [09-Mar-2026 23:21:46] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:46 deployment-shellbox-video docker-shellbox[2445164]: [09-Mar-2026 23:21:46] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:46 deployment-shellbox-video docker-shellbox[2445164]: [09-Mar-2026 23:21:46] ERROR: FPM initialization failed
Mar 09 23:21:46 deployment-shellbox-video docker-shellbox[2445164]: [09-Mar-2026 23:21:46] ERROR: FPM initialization failed
Mar 09 23:21:47 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:47 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.
Mar 09 23:21:57 deployment-shellbox-video systemd[1]: shellbox.service: Scheduled restart job, restart counter is at 2863365.
Mar 09 23:21:57 deployment-shellbox-video systemd[1]: Stopped shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:57 deployment-shellbox-video systemd[1]: Starting shellbox.service - Systemd runner for shellbox...
Mar 09 23:21:57 deployment-shellbox-video docker-shellbox[2445290]: Error response from daemon: No such container: shellbox.service
Mar 09 23:21:57 deployment-shellbox-video docker-shellbox[2445298]: Error: No such container: shellbox.service
Mar 09 23:21:57 deployment-shellbox-video systemd[1]: Started shellbox.service - Systemd runner for shellbox.
Mar 09 23:21:58 deployment-shellbox-video docker-shellbox[2445306]: [09-Mar-2026 23:21:58] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:58 deployment-shellbox-video docker-shellbox[2445306]: [09-Mar-2026 23:21:58] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
Mar 09 23:21:58 deployment-shellbox-video docker-shellbox[2445306]: [09-Mar-2026 23:21:58] ERROR: FPM initialization failed
Mar 09 23:21:58 deployment-shellbox-video docker-shellbox[2445306]: [09-Mar-2026 23:21:58] ERROR: FPM initialization failed
Mar 09 23:21:59 deployment-shellbox-video systemd[1]: shellbox.service: Main process exited, code=exited, status=78/CONFIG
Mar 09 23:21:59 deployment-shellbox-video systemd[1]: shellbox.service: Failed with result 'exit-code'.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/1eb1e98e55e9116c20021a58c6d351b91d43c518%5E%21/#F0

diff --git a/deployment-prep/deployment-shellbox.yaml b/deployment-prep/deployment-shellbox.yaml
index 40cd474..c3f1e94 100644
--- a/deployment-prep/deployment-shellbox.yaml
+++ b/deployment-prep/deployment-shellbox.yaml

@@ -2,8 +2,7 @@
 profile::docker::engine::version: 20.10.5+dfsg1-1+deb11u2
 profile::docker::runner::service_defs:
   httpd:
-    bind_mounts:
-      /run/shared: /run/shared
+    bind_mounts: {}
     environment:
       FCGI_MODE: FCGI_UNIX
     image_name: httpd-fcgi
@@ -11,7 +10,6 @@
     version: latest
   shellbox:
     bind_mounts:
-      /run/shared: /run/shared
       /srv/shellbox/config/: /srv/app/config
       /srv/shellbox/src: /srv/app/src
     config: {}
bd808@deployment-shellbox01:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-shellbox01.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(b70cd49b21) gitpuppet - MediaWiki: Only proxy existing .php files, otherwise return nice 404'
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Systemd::Unit[httpd]/File[/lib/systemd/system/httpd.service]/content:
--- /lib/systemd/system/httpd.service   2024-07-20 14:38:37.962125636 +0000
+++ /tmp/puppet-file20260309-67987-kgam1x       2026-03-09 23:30:27.549274616 +0000
@@ -8,7 +8,7 @@
 ExecStartPre=-/usr/bin/docker stop %n
 ExecStartPre=-/usr/bin/docker rm %n
 ExecStartPre=/usr/bin/docker pull docker-registry.wikimedia.org/httpd-fcgi:latest
-ExecStart=/usr/bin/docker run --rm=true  --env-file /etc/httpd/env -p 8080:8080 -v /etc/httpd/:/etc/httpd  -v /run/shared:/run/shared --name %n docker-registry.wikimedia.org/httpd-fcgi:latest
+ExecStart=/usr/bin/docker run --rm=true  --env-file /etc/httpd/env -p 8080:8080 -v /etc/httpd/:/etc/httpd --name %n docker-registry.wikimedia.org/httpd-fcgi:latest
 Restart=always
 RestartSec=10s
 NotifyAccess=all

Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Systemd::Unit[httpd]/File[/lib/systemd/system/httpd.service]/content: content changed '{sha256}f70d2242bcb420918dd4635fb36ce044b3a5b12862c97859cc13084fff0dfbb7' to '{sha256}42a1a62761e8e8498be214f5fb0b7c413e68d4ecf89f02ad14e12a0caae1811e'
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Systemd::Unit[httpd]/File[/lib/systemd/system/httpd.service]: Scheduling refresh of Exec[systemd daemon-reload for httpd.service (httpd)]
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Systemd::Unit[httpd]/Exec[systemd daemon-reload for httpd.service (httpd)]: Triggered 'refresh' from 1 event
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Systemd::Unit[httpd]/Exec[systemd daemon-reload for httpd.service (httpd)]: Scheduling refresh of Service[httpd]
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[httpd]/Systemd::Service[httpd]/Service[httpd]: Triggered 'refresh' from 1 event
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Systemd::Unit[shellbox]/File[/lib/systemd/system/shellbox.service]/content:
--- /lib/systemd/system/shellbox.service        2024-07-20 14:38:40.298169125 +0000
+++ /tmp/puppet-file20260309-67987-9myeyo       2026-03-09 23:30:29.073284919 +0000
@@ -7,7 +7,7 @@
 [Service]
 ExecStartPre=-/usr/bin/docker stop %n
 ExecStartPre=-/usr/bin/docker rm %n
-ExecStart=/usr/bin/docker run --rm=true  --env-file /etc/shellbox/env -p 8081:8081 -v shellbox:/etc/shellbox  -v /run/shared:/run/shared  -v /srv/shellbox/config/:/srv/app/config  -v /srv/shellbox/src:/srv/app/src --name %n docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video --nodaemonize
+ExecStart=/usr/bin/docker run --rm=true  --env-file /etc/shellbox/env -p 8081:8081 -v shellbox:/etc/shellbox  -v /srv/shellbox/config/:/srv/app/config  -v /srv/shellbox/src:/srv/app/src --name %n docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video --nodaemonize
 Restart=always
 RestartSec=10s
 NotifyAccess=all

Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Systemd::Unit[shellbox]/File[/lib/systemd/system/shellbox.service]/content: content changed '{sha256}2567effe62a0f53701c7412a60b0bf449d1d9aabccfd82445230df0107f094d5' to '{sha256}88e6e04e9b6d21909e1a4993e300a98fb1a203a78e7e647feb27b9f627d2201c'
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Systemd::Unit[shellbox]/File[/lib/systemd/system/shellbox.service]: Scheduling refresh of Exec[systemd daemon-reload for shellbox.service (shellbox)]
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Systemd::Unit[shellbox]/Exec[systemd daemon-reload for shellbox.service (shellbox)]: Triggered 'refresh' from 1 event
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Systemd::Unit[shellbox]/Exec[systemd daemon-reload for shellbox.service (shellbox)]: Scheduling refresh of Service[shellbox]
Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Profile::Docker::Runner/Service::Docker[shellbox]/Systemd::Service[shellbox]/Service[shellbox]: Unscheduling refresh on Service[shellbox]
Notice: Applied catalog in 17.47 seconds
bd808@deployment-shellbox01:~$ echo $?
2
bd808@deployment-shellbox01:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-shellbox01.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(b70cd49b21) gitpuppet - MediaWiki: Only proxy existing .php files, otherwise return nice 404'
Notice: Applied catalog in 7.61 seconds
bd808@deployment-shellbox01:~$ echo $?
0

Removing the bind mount of /run/shared from the Docker config in hiera seems to have made things happier. I have no explanation at all for why this would have suddenly become a problem. git blame showed that setting being in the config since profile::docker::runner::service_defs was first added in 2024-07-20.

bd808 claimed this task.

I am going to mark this as resolved because I am just not curious enough to keep it open and try to find a root cause for it suddenly being broken. I am sort of wondering if the sudden change was that monitoring picked the problem up since while it was still actively messed up I was not seeing an alert at https://prometheus-alerts.wmcloud.org/?q=project%3Ddeployment-prep nor was I seeing a ticket for the mirrored symptoms on deployment-shellbox-video.