Page MenuHomePhabricator

releases1003 file system over 90% full
Open, MediumPublic

Description

File system usage on releases1003 has been growing since March 2024, triggered an alert yesterday and warrants a review and possibly a clean up.

File system usage since the beginning of the year: https://w.wiki/AUFK

releases1003_root_partition.png (764×918 px, 56 KB)

Details

TitleReferenceAuthorSource BranchDest Branch
branch-cut-test-patches: clean up MW checkoutsrepos/releng/release!85jnucheT368239main
scap clean: perform l10n cleanup only when l10n files can be foundrepos/releng/scap!365jnucheT368239master
branch-cut-test-patches: clean up MW checkoutsrepos/releng/release!83jnucheT368239main
Customize query in GitLab

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2024-06-24T08:39:19Z] <hashar> releases1003: deleting left over temporary files from the MediaWiki branching (rm -fR /tmp/mw-branching-*) | T368239

hashar added subscribers: jnuche, dancy, hashar.

That is the / partition being filed, though the application/services should write to a standalone partition mounted somewhere under /srv. Also /tmp is on the root partition as well. I guess there is a bit of repartitioning that is needed?

There are three left over 3.2GBytes directories in /tmp:

3198	/tmp/mw-branching-01_qo309
3220	/tmp/mw-branching-x7h64ixp
3218	/tmp/mw-branching-iah3hkx7

That comes from https://gitlab.wikimedia.org/repos/releng/release.git which does create a temp directory but the cleanup might not occurs in case of failure/interrupt.


There is 3G+ in /home/dancy, I haven't touched them though.


/srv/org/wikimedia/releases has 32G which is to be expected, that is the tarballs we have released over time.


/srv/jenkins-agent/workspace has ~ 53G which entirely due to the Branch cut test patches Jenkins job. It should remove some material on build completion. Notably in work/mediawiki which kept copies of wmf branch since March 26th and php-1.42.0-wmf.24. I have deleted a bunch of them which resolves the root cause.


$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       125G   67G   52G  57% /

Thanks for cleaning up hashar!

I've created a patch to make branch-cut-test-patches clean up the MW checkouts. The stuff in /tmp/mw-branching-* is already being cleared regularly here: https://gitlab.wikimedia.org/repos/releng/release/-/blob/main/make-release/automatic-branch-cut?ref_type=heads#L24

As discussed in IRC, the release VMs should probably have a separate disk mounted at /srv (similar to the 150GB disk mounted at /srv/docker). However, I'm not sure if /srv/docker needs all that space. The Docker partition is using only 5%, with no significant change over the past month. Therefore, we could use this larger disk at /srv and mount a smaller one for Docker, or just use the 150GB disk for /srv (including /srv/docker).

I don't think there's an expected increase in disk usage any time soon, mounting the 150G disk directly at /srv makes sense to me.

There is 3G+ in /home/dancy, I haven't touched them though.

Cleaned up.

Changing the partition layout in the correct way would mean reimaging the hosts and if we're doing that we should also upgrade them to Bookworm.

LSobanski triaged this task as Medium priority.Mon, Jun 24, 3:18 PM
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

jnuche closed https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/365

scap clean: perform l10n cleanup only when l10n files can be found