Page MenuHomePhabricator

wbregistry-01 is out of disk space
Open, Needs TriagePublic

Description

The 20GB disk space of the wbregistry-01 instance of wikibase-registry has fully filled up. The vast majority is located in /var/lib/docker, followed by /var/cache/apt/ which makes up 10% (~2GB).

The container running:

image.png (258×1 px, 346 KB)

there are several seemingly duplicate container.

It seems to have been so out of disk space, that it even couldn't send error emails :/

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2021-07-27T16:22:02Z] <wm-bot> sudo docker kill 36a5b061acfe stopping duplicate old docker container T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-27T16:22:37Z] <wm-bot> sudo docker kill 9e0309ff1120 stopping duplicate old docker container T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-27T16:22:43Z] <wm-bot> sudo docker kill dcd0b3d98c51 stopping duplicate old docker container T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-27T16:23:15Z] <wm-bot> sudo docker container prune removing stopped containers to free up space T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-27T16:23:23Z] <wm-bot> sudo docker image prune removing dangling docker images to free up space T287492

Was it that it couldn't send emails because it was out of space, or out of space because of all the emails it couldn't send? It has been reported that there was a general problem with email delivery for some time.

Apparently, most of the disk usage was concentrated in the docker directory, due to old jobs somehow dangling there. We will look into this more. We will also check for the emails and try to find a solution where this does not happen again.

For now, we're back to 11% (~2GB) free space.

Mentioned in SAL (#wikimedia-cloud) [2021-07-28T13:50:28Z] <wm-bot> sudo docker kill wikibase-registry_wikibase-update_run_1 kill container not in /root/wikibase-registry/docker-compose.yml - no idea where that came from. T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-28T13:53:04Z] <wm-bot> sudo docker container prune remove dangling container meta data. T287492

Mentioned in SAL (#wikimedia-cloud) [2021-07-28T13:53:32Z] <wm-bot> sudo docker image prune remove dangling images. Freed up 117 MB. T287492

After deleting the duplicate container, taking /root/wikibase-registry/docker-compose.yml as the source of truth for what container are supposed to run and running sudo docker container prune and sudo docker image prune, we're back to 11% free space:

image.png (186×538 px, 15 KB)

This isn't great yet. Follow-up work will happen in sub-tasks.

Mentioned in SAL (#wikimedia-cloud) [2021-07-28T14:57:13Z] <wm-bot> sudo docker image prune --all removing all unused docker images. Freed up 5.14 GB. T287492