File system usage on releases1003 has been growing since March 2024, triggered an alert yesterday and warrants a review and possibly a clean up.
File system usage since the beginning of the year: https://w.wiki/AUFK
File system usage on releases1003 has been growing since March 2024, triggered an alert yesterday and warrants a review and possibly a clean up.
File system usage since the beginning of the year: https://w.wiki/AUFK
Title | Reference | Author | Source Branch | Dest Branch | |
---|---|---|---|---|---|
branch-cut-test-patches: clean up MW checkouts | repos/releng/release!85 | jnuche | T368239 | main | |
scap clean: perform l10n cleanup only when l10n files can be found | repos/releng/scap!365 | jnuche | T368239 | master | |
branch-cut-test-patches: clean up MW checkouts | repos/releng/release!83 | jnuche | T368239 | main |
Mentioned in SAL (#wikimedia-releng) [2024-06-24T08:39:19Z] <hashar> releases1003: deleting left over temporary files from the MediaWiki branching (rm -fR /tmp/mw-branching-*) | T368239
That is the / partition being filed, though the application/services should write to a standalone partition mounted somewhere under /srv. Also /tmp is on the root partition as well. I guess there is a bit of repartitioning that is needed?
There are three left over 3.2GBytes directories in /tmp:
3198 /tmp/mw-branching-01_qo309 3220 /tmp/mw-branching-x7h64ixp 3218 /tmp/mw-branching-iah3hkx7
That comes from https://gitlab.wikimedia.org/repos/releng/release.git which does create a temp directory but the cleanup might not occurs in case of failure/interrupt.
There is 3G+ in /home/dancy, I haven't touched them though.
/srv/org/wikimedia/releases has 32G which is to be expected, that is the tarballs we have released over time.
/srv/jenkins-agent/workspace has ~ 53G which entirely due to the Branch cut test patches Jenkins job. It should remove some material on build completion. Notably in work/mediawiki which kept copies of wmf branch since March 26th and php-1.42.0-wmf.24. I have deleted a bunch of them which resolves the root cause.
$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 125G 67G 52G 57% /
jnuche opened https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/83
branch-cut-test-patches: clean up MW checkouts
Thanks for cleaning up hashar!
I've created a patch to make branch-cut-test-patches clean up the MW checkouts. The stuff in /tmp/mw-branching-* is already being cleared regularly here: https://gitlab.wikimedia.org/repos/releng/release/-/blob/main/make-release/automatic-branch-cut?ref_type=heads#L24
As discussed in IRC, the release VMs should probably have a separate disk mounted at /srv (similar to the 150GB disk mounted at /srv/docker). However, I'm not sure if /srv/docker needs all that space. The Docker partition is using only 5%, with no significant change over the past month. Therefore, we could use this larger disk at /srv and mount a smaller one for Docker, or just use the 150GB disk for /srv (including /srv/docker).
I don't think there's an expected increase in disk usage any time soon, mounting the 150G disk directly at /srv makes sense to me.
Changing the partition layout in the correct way would mean reimaging the hosts and if we're doing that we should also upgrade them to Bookworm.
jnuche merged https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/83
branch-cut-test-patches: clean up MW checkouts
jnuche opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/365
scap clean: perform l10n cleanup only when l10n files can be found
jnuche opened https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/85
branch-cut-test-patches: clean up MW checkouts
jnuche closed https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/365
scap clean: perform l10n cleanup only when l10n files can be found
jnuche merged https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/85
branch-cut-test-patches: clean up MW checkouts