releases1003 file system over 90% full
Open, MediumPublic
Actions

Assigned To

None

Authored By

	LSobanski
	Mon, Jun 24, 7:52 AM

Description

File system usage on releases1003 has been growing since March 2024, triggered an alert yesterday and warrants a review and possibly a clean up.

File system usage since the beginning of the year: https://w.wiki/AUFK

releases1003_root_partition.png (764×918 px, 56 KB)

Details

Title	Reference	Author	Source Branch	Dest Branch
branch-cut-test-patches: clean up MW checkouts	repos/releng/release!85	jnuche	T368239	main
scap clean: perform l10n cleanup only when l10n files can be found	repos/releng/scap!365	jnuche	T368239	master
branch-cut-test-patches: clean up MW checkouts	repos/releng/release!83	jnuche	T368239	main

Customize query in GitLab

Event Timeline

LSobanski created this task.Mon, Jun 24, 7:52 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Jun 24, 7:52 AM

hashar updated the task description. (Show Details)Mon, Jun 24, 8:31 AM

Mentioned in SAL (#wikimedia-releng) [2024-06-24T08:39:19Z] <hashar> releases1003: deleting left over temporary files from the MediaWiki branching (rm -fR /tmp/mw-branching-*) | T368239

That is the / partition being filed, though the application/services should write to a standalone partition mounted somewhere under /srv. Also /tmp is on the root partition as well. I guess there is a bit of repartitioning that is needed?

There are three left over 3.2GBytes directories in /tmp:

3198	/tmp/mw-branching-01_qo309
3220	/tmp/mw-branching-x7h64ixp
3218	/tmp/mw-branching-iah3hkx7

That comes from https://gitlab.wikimedia.org/repos/releng/release.git which does create a temp directory but the cleanup might not occurs in case of failure/interrupt.

There is 3G+ in /home/dancy, I haven't touched them though.

/srv/org/wikimedia/releases has 32G which is to be expected, that is the tarballs we have released over time.

/srv/jenkins-agent/workspace has ~ 53G which entirely due to the Branch cut test patches Jenkins job. It should remove some material on build completion. Notably in work/mediawiki which kept copies of wmf branch since March 26th and php-1.42.0-wmf.24. I have deleted a bunch of them which resolves the root cause.

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       125G   67G   52G  57% /

Restricted Application added a project: Continuous-Integration-Infrastructure. · View Herald TranscriptMon, Jun 24, 8:50 AM

Jelto subscribed.Mon, Jun 24, 8:52 AM

jnuche opened https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/83

branch-cut-test-patches: clean up MW checkouts

Thanks for cleaning up hashar!

I've created a patch to make branch-cut-test-patches clean up the MW checkouts. The stuff in /tmp/mw-branching-* is already being cleared regularly here: https://gitlab.wikimedia.org/repos/releng/release/-/blob/main/make-release/automatic-branch-cut?ref_type=heads#L24

As discussed in IRC, the release VMs should probably have a separate disk mounted at /srv (similar to the 150GB disk mounted at /srv/docker). However, I'm not sure if /srv/docker needs all that space. The Docker partition is using only 5%, with no significant change over the past month. Therefore, we could use this larger disk at /srv and mount a smaller one for Docker, or just use the 150GB disk for /srv (including /srv/docker).

I don't think there's an expected increase in disk usage any time soon, mounting the 150G disk directly at /srv makes sense to me.

In T368239#9916651, @hashar wrote:

There is 3G+ in /home/dancy, I haven't touched them though.

Cleaned up.

Changing the partition layout in the correct way would mean reimaging the hosts and if we're doing that we should also upgrade them to Bookworm.

LSobanski triaged this task as Medium priority.Mon, Jun 24, 3:18 PM

LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

jnuche merged https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/83

branch-cut-test-patches: clean up MW checkouts

Maintenance_bot removed a project: Patch-For-Review.Tue, Jun 25, 4:31 PM

jnuche opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/365

scap clean: perform l10n cleanup only when l10n files can be found

jnuche opened https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/85

branch-cut-test-patches: clean up MW checkouts

jnuche closed https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/365

scap clean: perform l10n cleanup only when l10n files can be found

jnuche merged https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/85

branch-cut-test-patches: clean up MW checkouts

Maintenance_bot removed a project: Patch-For-Review.Thu, Jun 27, 4:30 PM

	F55814693: releases1003_root_partition.png
	Mon, Jun 24, 8:31 AM

releases1003 file system over 90% fullOpen, MediumPublicActions

Description

Details

Event Timeline

releases1003 file system over 90% full
Open, MediumPublic
Actions