Page MenuHomePhabricator

tmp file leak from tools.wsexport
Closed, ResolvedPublic

Description

The wsexport tool is leaving loads of temp files on tools-webgrid-lighttpd-1422 (xvfb and calibre files).

I am cleaning up older ones, which I hope is safe. Is there anything that can be done to prevent this, or is it having a malfunction? It reads 11GB at the moment.

Event Timeline

Bstorm triaged this task as Medium priority.Mar 20 2018, 5:09 PM

Executed sudo find /tmp -type f -user 'tools.wsexport' -mtime +10 -delete

It is now 2.7G.

This is pretty out of control:

$ clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -user tools.wsexport -mtime +1|wc -l'
tools-exec-[1401-1442].tools.eqiad.wmflabs,tools-exec-gift-trusty-01.tools.eqiad.wmflabs,tools-webgrid-generic-[1401-1404].tools.eqiad.wmflabs,tools-webgrid-lighttpd-1424.tools.eqiad.wmflabs (48)
---------------
0
---------------
tools-webgrid-lighttpd-1401.tools.eqiad.wmflabs
---------------
27551
---------------
tools-webgrid-lighttpd-1402.tools.eqiad.wmflabs
---------------
868
---------------
tools-webgrid-lighttpd-1403.tools.eqiad.wmflabs
---------------
65710
---------------
tools-webgrid-lighttpd-1404.tools.eqiad.wmflabs
---------------
144642
---------------
tools-webgrid-lighttpd-1405.tools.eqiad.wmflabs
---------------
61555
---------------
tools-webgrid-lighttpd-1406.tools.eqiad.wmflabs
---------------
52285
---------------
tools-webgrid-lighttpd-1407.tools.eqiad.wmflabs
---------------
74092
---------------
tools-webgrid-lighttpd-1408.tools.eqiad.wmflabs
---------------
59978
---------------
tools-webgrid-lighttpd-1409.tools.eqiad.wmflabs
---------------
125882
---------------
tools-webgrid-lighttpd-1410.tools.eqiad.wmflabs
---------------
88648
---------------
tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs
---------------
209145
---------------
tools-webgrid-lighttpd-1412.tools.eqiad.wmflabs
---------------
239406
---------------
tools-webgrid-lighttpd-1413.tools.eqiad.wmflabs
---------------
103669
---------------
tools-webgrid-lighttpd-1414.tools.eqiad.wmflabs
---------------
128941
---------------
tools-webgrid-lighttpd-1415.tools.eqiad.wmflabs
---------------
36681
---------------
tools-webgrid-lighttpd-1416.tools.eqiad.wmflabs
---------------
229033
---------------
tools-webgrid-lighttpd-1417.tools.eqiad.wmflabs
---------------
204364
---------------
tools-webgrid-lighttpd-1418.tools.eqiad.wmflabs
---------------
227864
---------------
tools-webgrid-lighttpd-1419.tools.eqiad.wmflabs
---------------
88651
---------------
tools-webgrid-lighttpd-1420.tools.eqiad.wmflabs
---------------
31018
---------------
tools-webgrid-lighttpd-1421.tools.eqiad.wmflabs
---------------
42255
---------------
tools-webgrid-lighttpd-1422.tools.eqiad.wmflabs
---------------
60167
---------------
tools-webgrid-lighttpd-1425.tools.eqiad.wmflabs
---------------
152293
---------------
tools-webgrid-lighttpd-1426.tools.eqiad.wmflabs
---------------
189075
---------------
tools-webgrid-lighttpd-1427.tools.eqiad.wmflabs
---------------
173637
---------------
tools-webgrid-lighttpd-1428.tools.eqiad.wmflabs
---------------
183686

Mentioned in SAL (#wikimedia-cloud) [2018-03-21T01:09:50Z] <bd808> Deleting /tmp files owned by tools.wsexport with -mtime +2 across grid (T190185)

Better, but still far too many leaked files.

$ clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -user tools.wsexport -mtime +1|wc -l'
tools-exec-[1401-1442].tools.eqiad.wmflabs,tools-exec-gift-trusty-01.tools.eqiad.wmflabs,tools-webgrid-generic-[1401-1404].tools.eqiad.wmflabs,tools-webgrid-lighttpd-[1401-1409,1412,1417-1419,1424].tools.eqiad.wmflabs (61)
---------------
0
---------------
tools-webgrid-lighttpd-1410.tools.eqiad.wmflabs
---------------
639
---------------
tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs
---------------
7701
---------------
tools-webgrid-lighttpd-1413.tools.eqiad.wmflabs
---------------
1754
---------------
tools-webgrid-lighttpd-1414.tools.eqiad.wmflabs
---------------
2641
---------------
tools-webgrid-lighttpd-1415.tools.eqiad.wmflabs
---------------
9270
---------------
tools-webgrid-lighttpd-1416.tools.eqiad.wmflabs
---------------
7481
---------------
tools-webgrid-lighttpd-1420.tools.eqiad.wmflabs
---------------
16390
---------------
tools-webgrid-lighttpd-1421.tools.eqiad.wmflabs
---------------
9292
---------------
tools-webgrid-lighttpd-1422.tools.eqiad.wmflabs
---------------
1757
---------------
tools-webgrid-lighttpd-1425.tools.eqiad.wmflabs
---------------
1279
---------------
tools-webgrid-lighttpd-1426.tools.eqiad.wmflabs
---------------
7432
---------------
tools-webgrid-lighttpd-1427.tools.eqiad.wmflabs
---------------
8925
---------------
tools-webgrid-lighttpd-1428.tools.eqiad.wmflabs
---------------
6316

Mentioned in SAL (#wikimedia-cloud) [2018-03-21T17:23:40Z] <bd808> clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -user tools.wsexport -mtime +1 -delete' (T190185)

We could use the tmpreaper tool fairly easily (since there is already a puppet module), I think. It feels band-aid/duct-tape-ish, but it would do the thing.

Change 422186 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: Add tmpreaper with a custom config to web nodes

https://gerrit.wikimedia.org/r/422186

Change 422186 merged by Bstorm:
[operations/puppet@production] toolforge: Add tmpreaper with a custom config to web nodes

https://gerrit.wikimedia.org/r/422186

Confirmed that old tmp files are now being cleaned up automatically on the web hosts.

Bstorm removed a project: Patch-For-Review.