Page MenuHomePhabricator

Automatically clean up unused wmfXX versions
Closed, ResolvedPublic

Description

Old versions take up space. Not only code, but also l10n files. Eg: In the 1.23wmf17 version checkout the full l10n cache (CDB and JSON files) consumes 1.6G of disk space.

Remembering to do this is annoying and error prone. It should be automated.

Bryan started the work with https://gerrit.wikimedia.org/r/#/c/118337/ "Add script to cleanup l10n cache for an inactive MediaWiki version"


Version: wmf-deployment
Severity: normal

Details

Reference
bz71313

Related Objects

StatusAssignedTask
OpenNone
OpenNone
StalledNone
Resolveddemon
Declinedmmodell
InvalidNone
Resolvedmmodell
ResolvedJdforrester-WMF
Declinedmmodell
Resolvedmmodell
Resolvedmmodell
Resolvedmmodell
OpenNone
ResolvedKrinkle
ResolvedKrinkle
Resolvedmmodell
DuplicateKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedMaxSem
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:57 AM
bzimport added a project: Deployments.
bzimport set Reference to bz71313.
bzimport added a subscriber: Unknown Object (MLST).
greg created this task.Sep 25 2014, 4:54 PM
greg moved this task from To Triage to Backlog (Tech) on the Deployments board.Nov 25 2014, 10:55 PM
greg edited projects, added scap2; removed Deployments.Feb 9 2016, 11:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2016, 11:34 PM
Krinkle added a subscriber: Krinkle.
demon added a subscriber: demon.Dec 12 2016, 6:34 PM

This is easier now than before, eg: scap clean 1.28.0-wmf.9

demon added a comment.Feb 7 2017, 5:34 AM

This is easier now than before, eg: scap clean 1.28.0-wmf.9

I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with the logic properly hidden it's trivial to do the work.

I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with the logic properly hidden it's trivial to do the work.

It does seem like things are getting cleaned up reasonably right now. the .5 and .6 branches could probably be killed (our anon cache window is ~30 days now right?), but there isn't a gross amount of old versions hanging out.

tin:~
bd808$ ls -ldthr /srv/mediawiki-staging/php-*
drwxrwxr-x 16 demon           wikidev 4.0K Dec 15 14:15 /srv/mediawiki-staging/php-1.29.0-wmf.5/
drwxrwxr-x 16 twentyafterfour wikidev 4.0K Jan  5 00:33 /srv/mediawiki-staging/php-1.29.0-wmf.6/
drwxrwsr-x 16 thcipriani      wikidev 4.0K Jan 18 15:08 /srv/mediawiki-staging/php-1.29.0-wmf.7/
drwxrwsr-x 16 demon           wikidev 4.0K Jan 21 19:58 /srv/mediawiki-staging/php-1.29.0-wmf.8/
drwxrwsr-x 16 twentyafterfour wikidev 4.0K Feb  1 14:21 /srv/mediawiki-staging/php-1.29.0-wmf.9/
drwxrwsr-x 16 twentyafterfour wikidev 4.0K Feb  6 20:34 /srv/mediawiki-staging/php-1.29.0-wmf.10/

Back in the olden days the big space hog that caused issues on hosts with smaller disks was the l10n cache files. It looks like those are still hanging out much longer than needed:

tin:~
bd808$ ls -ldthr /srv/mediawiki-staging/php-*/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.2M Dec 15 01:31 /srv/mediawiki-staging/php-1.29.0-wmf.5/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan  5 02:18 /srv/mediawiki-staging/php-1.29.0-wmf.6/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan 18 20:13 /srv/mediawiki-staging/php-1.29.0-wmf.7/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan 23 02:44 /srv/mediawiki-staging/php-1.29.0-wmf.8/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Feb  2 02:20 /srv/mediawiki-staging/php-1.29.0-wmf.9/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Feb  7 02:18 /srv/mediawiki-staging/php-1.29.0-wmf.10/cache/l10n/upstream/l10n_cache-en.cdb.json

A full scap checks all of these files in the rsync. Getting rid of them sooner rather than later takes a small bit of load off of all of the MW hosts during a scap and really can speed things up in a measurable way. "Soon" we will have git transport for all of this which should make that mostly moot however by using pre-computed diffs.

I agree that automatic cleanup is tricky though and probably harder to get right at this point than is worth while. Deciding if a branch purge is safe takes knowledge of the last time each branch would have been associated with an anon page view that was added to Varnish and some idea of the reasonable maximum time that Varnish should be holding on to HTML that has a branch versioned reference to static content in it.

demon added a comment.Feb 7 2017, 4:41 PM

I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with the logic properly hidden it's trivial to do the work.

It does seem like things are getting cleaned up reasonably right now. the .5 and .6 branches could probably be killed (our anon cache window is ~30 days now right?), but there isn't a gross amount of old versions hanging out.

The current practice is retaining the previous 5 branches. That's technically 35 days of going back. 4 branch only puts us with 28 days. It's probably fine but I've been paranoid up to now. Lowering that cache TTL to 28 days would make it an even 4 weeks (which is easier to count than 30 days tbh). Cf T140921: Static asset time on disk

Back in the olden days the big space hog that caused issues on hosts with smaller disks was the l10n cache files. It looks like those are still hanging out much longer than needed:

tin:~
bd808$ ls -ldthr /srv/mediawiki-staging/php-*/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.2M Dec 15 01:31 /srv/mediawiki-staging/php-1.29.0-wmf.5/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan  5 02:18 /srv/mediawiki-staging/php-1.29.0-wmf.6/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan 18 20:13 /srv/mediawiki-staging/php-1.29.0-wmf.7/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Jan 23 02:44 /srv/mediawiki-staging/php-1.29.0-wmf.8/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Feb  2 02:20 /srv/mediawiki-staging/php-1.29.0-wmf.9/cache/l10n/upstream/l10n_cache-en.cdb.json
-rw-r--r-- 1 l10nupdate l10nupdate 3.3M Feb  7 02:18 /srv/mediawiki-staging/php-1.29.0-wmf.10/cache/l10n/upstream/l10n_cache-en.cdb.json

A full scap checks all of these files in the rsync. Getting rid of them sooner rather than later takes a small bit of load off of all of the MW hosts during a scap and really can speed things up in a measurable way. "Soon" we will have git transport for all of this which should make that mostly moot however by using pre-computed diffs.

So there's an option in scap clean called scap clean --l10n-only. This should be documented--when I trimmed down the instructions I ended up removing a bit more than I should. I'll fix that up in a bit.

I agree that automatic cleanup is tricky though and probably harder to get right at this point than is worth while. Deciding if a branch purge is safe takes knowledge of the last time each branch would have been associated with an anon page view that was added to Varnish and some idea of the reasonable maximum time that Varnish should be holding on to HTML that has a branch versioned reference to static content in it.

So, my thought was encoding this logic in the clean plugin but erring on the side of caution. Do it based on the # of branches, not dates. "If the branch is older than $current - 5, delete; if the branch is older than the last one, delete i18n" This would be correct most of the time, but would keep us from breaking when we kept a branch for 2 (or more) weeks. This would also mean a deployer just types scap clean and it DWIM.

Change 336730 had a related patch set uploaded (by Chad):
Scap clean: Rework --l10n-only into --keep-static

https://gerrit.wikimedia.org/r/336730

Change 336730 merged by jenkins-bot:
Scap clean: Rework --l10n-only into --keep-static

https://gerrit.wikimedia.org/r/336730

demon added a comment.Apr 4 2017, 6:55 PM

scap clean does all this. New bugs should be opened if there's issues with it.

demon closed this task as Resolved.Apr 4 2017, 6:55 PM
demon claimed this task.