Page MenuHomePhabricator

integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space
Closed, DuplicatePublic

Description

This job started failing today: https://integration.wikimedia.org/ci/job/phabricator-jessie-commits/

And the failure points to a full disk:

13:08:46 Exception:
13:08:46 Traceback (most recent call last):
13:08:46   File "/srv/jenkins-workspace/workspace/phabricator-jessie-commits/source/.tox/py27/local/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
13:08:46     status = self.run(options, args)
13:08:46   File "/srv/jenkins-workstee: log/stdout.log: No space left on device

Event Timeline

Gilles created this task.Mar 13 2018, 1:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2018, 1:13 PM

Apparently too on integration-slave-jessie-1002.

Paladox triaged this task as Unbreak Now! priority.Mar 13 2018, 2:54 PM
Paladox added a subscriber: Paladox.
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptMar 13 2018, 2:55 PM

Just got it too on integration-slave-jessie-1001

Mentioned in SAL (#wikimedia-operations) [2018-03-13T17:06:37Z] <godog> cleanup integration-slave-jessie-1001:/srv/pbuilder/build - T189587

Would be helpful if we had a cleanup policy for artifacts so operators don't have to manually chase and delete things to recover disk space.

Paladox renamed this task from integration-slave-jessie-1001 out of disk space to integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space.
Ejegg added a subscriber: Ejegg.Mar 13 2018, 9:51 PM

The biggest dir in /srv/jenkins-workspace/workspace took almost 5G (apps-android-wikipedia-test), while the whole /srv disk is only allocated 21G.

Ejegg added a comment.Mar 13 2018, 9:53 PM

OK, jessie-1002 and jessie-1001 are cleaned up. Was getting some untar errors on docker-1002 too, so I'll see if I can clean that up.

docker-1002 now also cleaned up, but also had the 5G android test dir on a 21G disk. Seems like the out-of-space errors kicked in at 3G free too, so maybe we could dial down the reserved space from 15%.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptMar 14 2018, 7:53 AM

Change 419361 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate phpunit-coverage-patch jobs to Nodepool

https://gerrit.wikimedia.org/r/419361

Change 419361 merged by jenkins-bot:
[integration/config@master] Migrate phpunit-coverage-patch jobs to Nodepool

https://gerrit.wikimedia.org/r/419361