Page MenuHomePhabricator

integration-agent-docker-1032 out of disk space (was: Frequent Selenium failures)
Closed, ResolvedPublicBUG REPORT

Description

Selenium tests for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/924978 failed eight times in a row. Every error is different and unrelated to the patch (often even to the extension).

https://integration.wikimedia.org/ci/job/mediawiki-quibble-apitests-vendor-php74-docker/27694/console says "no space left on device", not sure if that's related.

Event Timeline

hashar renamed this task from Frequent Selenium failures to integration-agent-docker-1032 out of disk space (was: Frequent Selenium failures).Jun 12 2023, 12:33 PM
hashar added a subscriber: hashar.

"no space left on device", not sure if that's related.

That is definitely the root cause! :]

The Quibble image i started with several volumes mounted:

--tmpfs /workspace/db:size=320MFor the MariaDB database
--volume <job workspace on instance>/src:/workspace/srcMediaWiki source + patch(es)
--volume <job workspace on instance>/cache:/cacheComposer/npm cache
--volume <job workspace on instance>/log:/workspace/logFor build output (exposed as LOG_DIR inside the container)
--volume /srv/git:/srv/git:roMirror of some git repositories

Where <job workspace on instance> above is /srv/jenkins/workspace + the job name.

Thus the instance integration-agent-docker-1032 had a full /srv partition.

Then looking at disk usage on https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=integration&var-instance=integration-agent-docker-1032 that does not show up at 12:13pm (/srv is in green in the screenshot below)

agent-docker-1032_disk_usage.png (845×1 px, 86 KB)

The issue is the /srv partition which is holding the build is 36GB.

1600 MB are taken for the git mirroring

On that same host there are two builds running:

11936 MB ./jenkins/workspace/wmf-quibble-selenium-php81-docker
2703 MB ./jenkins/workspace/quibble-vendor-mysql-php74-noselenium-docker

So I guess sometime it overflows, notably if there are three heavy builds running concurrently.

That is know, we need larger disk space (both for /srv and /var/lib/docker which holds the Docker images and caches). I plan to have all instances rebuild to a larger flavor.

hashar claimed this task.