Page MenuHomePhabricator

gallium and lanthanum disks full (tracking)
Closed, ResolvedPublic

Description

For a while now (July 2014), disk usage on gallium is going straight down hill. As of January, it's getting a bit more dangerous.

The root disk is 75% full (100GB of 450GB available) and the SSD (used by Jenkins workspaces and gerrit replication) is 75% full (a mere 39GB of 150GB available).

[19:42 UTC] krinkle at gallium.wikimedia.org in ~
Filesystem      Size  Used Avail Use% Mounted on
/dev/md0        452G  321G  108G  75% /
udev            3.9G  4.0K  3.9G   1% /dev
tmpfs           798M   72M  726M  10% /run
none            5.0M     0  5.0M   0% /run/lock
none            3.9G     0  3.9G   0% /run/shm
/dev/sdb1       149G  111G   39G  75% /srv/ssd
tmpfs           512M     0  512M   0% /var/lib/jenkins-slave/tmpfs
[19:43 UTC] krinkle at gallium.wikimedia.org in /srv/ssd
$ du -sh *
7.6G    gerrit
0       jenkins
96G     jenkins-slave
6.9G    zuul
[19:45 UTC] krinkle at gallium.wikimedia.org in /srv/ssd/jenkins-slave
$ du -sh *
12K     maven..-interceptor.jar
432K    slave.jar
12M     tools
96G     workspace


Related Objects

StatusAssignedTask
ResolvedKrinkle
Resolvedhashar
Declinedhashar
OpenNone
Declinedhashar
Declinedhashar
Declinedhashar
ResolvedAndrew
Resolvedhashar
Declinedhashar
ResolvedKrinkle
Resolvedhashar
ResolvedDzahn
Resolvedhashar
Resolvedhashar
ResolvedAndrew
Resolvedhashar
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
Resolvedhashar
ResolvedKrinkle
Resolvedhashar
Resolvedhashar
Resolvedhashar
Resolvedhashar

Event Timeline

Krinkle created this task.Mar 1 2015, 8:08 PM
Krinkle raised the priority of this task from to High.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 1 2015, 8:08 PM
hashar added a subscriber: hashar.Mar 3 2015, 12:01 PM

Running in a screen:

hashar@gallium:/var/lib/jenkins/jobs$ du -sm *|sort -n

Some jobs build history probably need to be logrotated automatically.

hashar claimed this task.Mar 3 2015, 12:02 PM
hashar moved this task from Next to In-progress on the Continuous-Integration-Infrastructure board.
hashar added a comment.Mar 3 2015, 2:35 PM

I have canceled the command, instead we can use stat to list the number of entries in each builds directory. The top 30 offenders by number of hardlinks:

cd /var/lib/jenkins/jobs
$ stat --format '%h:%n' */builds|sort -rn|head -n30
13371:operations-puppet-tox-data_admin_lint/builds
12284:mwext-Wikibase-lint/builds
10051:mwext-MobileFrontend-lint/builds
8989:mwext-MobileFrontend-qunit/builds
8983:mwext-MobileFrontend-qunit-mobile/builds
8821:mwext-Flow-lint/builds
8713:operations-apache-config-lint/builds
7512:mwext-VisualEditor-lint/builds
7003:mwext-VisualEditor-npm/builds
7002:mwext-VisualEditor-qunit/builds
6991:mwext-VisualEditor-doc-test/builds
6932:operations-mw-config-tests/builds
5672:mwext-Wikibase-qunit/builds
5656:mwext-Wikibase-repo-tests/builds
5653:mwext-Wikibase-repo-api-tests/builds
5642:mwext-Wikibase-client-tests/builds
5618:mediawiki-core-doxygen-publish/builds
5463:mediawiki-gate/builds
5353:VisualEditor-npm/builds
5096:pywikibot-core-tox-flake8/builds
5040:mediawiki-core-phplint/builds
4863:operations-puppet-validate/builds
4777:VisualEditor-jsduck/builds
4638:mediawiki-core-bundle-rubocop/builds
4422:mwext-Flow-qunit/builds
4384:mwext-MobileFrontend-phpcs-HEAD/builds
4263:mediawiki-core-regression-phpcs-HEAD/builds
3953:pywikibot-core-tox-nose/builds
3920:mwext-MobileFrontend-jslint/builds
3808:operations-puppet-puppetlint-strict/builds

Those jobs are most probably not logrotated.

Related:

Filesystem      Size  Used Avail Use% Mounted on
/dev/md0        452G  223G  206G  52% /

Seems good to me isn't it ?

hashar closed this task as Resolved.Mar 17 2015, 9:15 AM

Resolved for now. Work is in progress to reduce the number of jobs being run that will help keep disk usage at a sane level.

Krinkle renamed this task from gallium.wikimedia.org disk space running low to gallium and lanthanum disks full (tracking).Mar 24 2015, 5:50 AM
Krinkle reopened this task as Open.
Krinkle removed hashar as the assignee of this task.
Krinkle removed a project: acl*sre-team.
Krinkle set Security to None.
Krinkle added a subscriber: Legoktm.
Krinkle closed this task as Resolved.Apr 5 2015, 10:01 AM
Krinkle claimed this task.

The on-going efforts on T86659 and T91396 have finally brought gallium and lanthanum to stable levels in terms of disk usage.