Page MenuHomePhabricator

Selenium fails with "waiting for container: unexpected EOF"
Closed, ResolvedPublic

Description

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1030999
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php74/6839/console

23:36:01 Execution of 2 workers started at 2024-05-14T21:36:01.673Z
23:36:01 
23:36:02 [0-1] RUNNING in chrome - /tests/selenium/specs/toolbar.js
23:36:02 [0-0] RUNNING in chrome - /tests/selenium/specs/content_editable.js
23:36:06 time="2024-05-14T21:36:06Z" level=error msg="error waiting for container: unexpected EOF"
23:36:06 Build step 'Execute shell' marked build as failure

The Docker package is version 20.10.18~3-0~debian-bullseye from bullseye-wikimedia/thirdparty/ci.

Details

Event Timeline

hashar added subscribers: MoritzMuehlenhoff, hashar.

I highly suspect auto restart being enabled for Docker which was deployed yesterday on May 14th around 10am with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1028795 for T135991.

That build ran on integration-agent-docker-1042.integration.eqiad1.wikimedia.cloud and failed on May 14th at 21:36:06 UTC. When looking at the host journal for the docker.service unit:

-- Journal begins at Thu 2024-04-25 10:36:03 UTC, ends at Wed 2024-05-15 12:12:30 UTC. --
May 14 21:36:05 integration-agent-docker-1042 systemd[1]: Stopping Docker Application Container Engine...
May 15 07:11:01 integration-agent-docker-1042 systemd[1]: Stopping Docker Application Container Engine...

That first restart of Docker happened exactly a second before the docker client reported it has lost connection with the daemon...

And the matching entries from /var/log/syslog:

May 14 21:36:05 integration-agent-docker-1042 wmf-auto-restart: INFO: 2024-05-14 21:36:05,867 : Detected necessary restart for service docker (6172)
May 14 21:36:09 integration-agent-docker-1042 wmf-auto-restart: INFO: 2024-05-14 21:36:09,861 : Restarted service docker
May 15 07:11:01 integration-agent-docker-1042 wmf-auto-restart: INFO: 2024-05-15 07:11:01,711 : Detected necessary restart for service containerd (644)
May 15 07:11:02 integration-agent-docker-1042 wmf-auto-restart: INFO: 2024-05-15 07:11:02,413 : Restarted service containerd

Change #1031899 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove auto restarts for containerd/docker

https://gerrit.wikimedia.org/r/1031899

Change #1031899 merged by Muehlenhoff:

[operations/puppet@production] Remove auto restarts for containerd/docker

https://gerrit.wikimedia.org/r/1031899

hashar claimed this task.

That was most certainly due to Docker being restarted automatically to catch up with package updates. The autorestart system has been disabled which should prevent the issue to occur again.