Today this paged:
NFS Share Volume Space /srv/tools on labstore1004 is CRITICAL: DISK CRITICAL - free space: /srv/tools 1263267 MB (15% inode=81%):
Similar to T247315
Today this paged:
NFS Share Volume Space /srv/tools on labstore1004 is CRITICAL: DISK CRITICAL - free space: /srv/tools 1263267 MB (15% inode=81%):
Similar to T247315
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | None | T272247 2021-01-17: tools NFS share cleanup | |||
Resolved | Magnus | T272430 mixnmatch microsync process has a large error file | |||
Resolved | Jc86035 | T272434 Toolforge tool 'archive-things-4' using very high disk space | |||
Resolved | RichSmith | T272435 cluebotng using a high storage on NFS | |||
Resolved | RichSmith | T284966 cluebotng using high storage on NFS | |||
Resolved | wmr | T272436 wmr-bot home directory using high NFS storage |
Mentioned in SAL (#wikimedia-cloud) [2021-01-17T16:53:53Z] <arturo> icinga downtime labstore1004 /srv/tools space check for 3 days (T272247)
It's nice to see the alert being accurate these days.
/dev/drbd4 8.0T 6.3T 1.4T 83% /srv/tools
Running ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20210119.txt
The bigger files:
19749772 KB /srv/tools/shared/tools/project/request/error.log 21072788 KB /srv/tools/shared/tools/project/mediawiki-feeds/error.log 22473872 KB /srv/tools/shared/tools/project/wikidata-primary-sources/error.log 22900348 KB /srv/tools/shared/tools/project/khanamalumat/qaus.err 23343528 KB /srv/tools/shared/tools/project/cluebotng/logs/relay_irc.log 24260512 KB /srv/tools/shared/tools/project/fiwiki-tools/logs/seulojabot2.log 24343364 KB /srv/tools/shared/tools/project/ifttt/www/python/src/ifttt.log 26970304 KB /srv/tools/shared/tools/project/mix-n-match/error.log 27890700 KB /srv/tools/shared/tools/project/img-usage/public_html/wikidata-20170130-all.json 31437236 KB /srv/tools/shared/tools/project/freebase/freebase-rdf-latest.gz 31811904 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1107.nt.gz 31811908 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1104.nt.gz 32818048 KB /srv/tools/shared/tools/project/khanamalumat/purawiki.err 34621292 KB /srv/tools/shared/tools/project/verification-pages/verification-pages/log/production.log.1 34792852 KB /srv/tools/shared/tools/project/geohack/error.log 35880272 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1097.nt.gz 36023964 KB /srv/tools/shared/tools/project/ping08bot/mybot.out 36285016 KB /srv/tools/shared/tools/project/wiki2prop/prediction_ranked_Wiki2PropDEPLOY_year2018_embedding300LG_DEPLOY.h5 49303704 KB /srv/tools/shared/tools/project/splinetools/dumps/enwiki-20141106-pages-articles.xml 64778744 KB /srv/tools/shared/tools/project/wikidata-analysis/public_html_tmp/dumpfiles/json-20191125/20191125.json.gz 78643272 KB /srv/tools/shared/tools/project/robokobot/virgule.err 89133980 KB /srv/tools/shared/tools/project/.shared/dumps/20201221.json.gz 89481636 KB /srv/tools/shared/tools/project/.shared/dumps/20210104.json.gz 101857128 KB /srv/tools/shared/tools/project/magnus-toolserver/error.log 107005676 KB /srv/tools/shared/tools/project/meetbot/meetbot.out 107035912 KB /srv/tools/shared/tools/project/meetbot/logs/messages.log 194101748 KB /srv/tools/shared/tools/project/mix-n-match/mnm-microsync.err
A few of those are easy enough to just clean up myself.
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T22:34:50Z] <bstorm> truncating 194 GB error log '/data/project/mix-n-match/mnm-microsync.err' T272247
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T22:43:03Z] <bstorm> truncated 107GB log '/data/project/meetbot/logs/messages.log' T272247
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T22:48:30Z] <bstorm> truncated 100GB error log /data/project/magnus-toolserver/error.log T272247
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T22:57:43Z] <bstorm> truncated 75GB error log /data/project/robokobot/virgule.err T272247
That was enough to get a recovery. However, it seems like a good idea to see what users can clean up since there are projects taking up quite significant space.
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T23:30:37Z] <bstorm> truncated 36GB mybot.out file T272247
Mentioned in SAL (#wikimedia-cloud) [2021-01-19T23:32:30Z] <bstorm> truncated 34GB error log file that was full of warnings like "Only variables should be passed by reference in /data/project/geohack/public_html/geohack.php on line 192" T272247
That brings us down to /dev/drbd4 8.0T 5.6T 2.1T 73% /srv/tools. The user tickets should bring things well into the safe zone when their cleanups are done (one is already done).
We got paged again today:
PROBLEM - NFS Share Volume Space /srv/tools on labstore1004 is CRITICAL: DISK CRITICAL - free space: /srv/tools 1259132 MB (15% inode=80%):
server check:
/dev/drbd4 8.0T 6.4T 1.3T 85% /srv/tools
Mentioned in SAL (#wikimedia-cloud) [2021-02-05T10:59:02Z] <arturo> icinga-downtime labstore1004 tools share space check for 1 week (T272247)
Does this ticket still need to open?, It's been superseded by T273961: 2021-02-05: tools NFS share cleanup, v2 and T276525: 2021-03-05: tools nfs share cleanup.