Page MenuHomePhabricator

2020-03-10: tools and misc nfs share cleanup
Closed, ResolvedPublic

Description

Once again, in the tradition of T156982: Cleanup tools nfs share on labstore1004/5 and T183920: 2018-01-02: labstore Tools and Misc share very full, we must clean up the NFS to prevent collapse.

/dev/drbd4      8.0T  7.1T  462G  95% /srv/tools
/dev/drbd3      5.0T  3.8T 1001G  80% /srv/misc

Event Timeline

Bstorm triaged this task as High priority.Mar 10 2020, 3:20 PM
Bstorm created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 10 2020, 3:20 PM
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm added a project: Data-Services.
Bstorm moved this task from Backlog to Shared Storage on the Data-Services board.

Running ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200310_2.txt just to reorder for sorting

Ran cat tools_large_files_20200310_2.txt | sort -h > tools_large_files_20200310_2_sorted.txt To see what we can clean up. Will have to clean up misc as well this round.

root@labstore1004:~# tail tools_large_files_20200310_2_sorted.txt
51368984 KB /srv/tools/shared/tools/project/fountain/logs/fastcgi.log
54437988 KB /srv/tools/shared/tools/project/toolserver-home-archive/archive-2014-06-05.tar.xz
64778744 KB /srv/tools/shared/tools/project/wikidata-analysis/public_html_tmp/dumpfiles/json-20191125/20191125.json.gz
71593400 KB /srv/tools/shared/tools/project/.shared/dumps/20200203.json.gz
72773724 KB /srv/tools/shared/tools/project/.shared/dumps/20200217.json.gz
74225580 KB /srv/tools/shared/tools/project/esfichataxon/latest-all.json.gz
74225580 KB /srv/tools/shared/tools/project/.shared/dumps/20200302.json.gz
78428056 KB /srv/tools/shared/tools/project/oar/repository_text_2014-06-13.tar.gz
79069936 KB /srv/tools/shared/tools/project/robokobot/virgule.err
204805540 KB /srv/tools/shared/tools/project/jembot/error.log

It's pretty clear that some things get around https://gerrit.wikimedia.org/r/c/operations/puppet/+/496082

Mentioned in SAL (#wikimedia-cloud) [2020-03-10T17:34:04Z] <bstorm_> truncated the error.log file T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-10T17:39:12Z] <bstorm_> truncated virgule.err for robokobot T247315

Entirely expected side-effect of limiting file sizes is that finding large files doesn't quite fix things as quickly as it used to. That said, 2 files made significant progress.

/dev/drbd4      8.0T  6.9T  727G  91% /srv/tools
/dev/drbd3      5.0T  3.8T 1001G  80% /srv/misc

From here, it might be useful to look at which directories are using the most data instead of files.

Running ionice -c 3 nice -19 find /srv/misc -type f -size +100M -printf "%k KB %p\n" > misc_large_files_20200310_2.txt to gather that info as well as far as files go.

Also ran cat misc_large_files_20200310_2.txt | sort -h > misc_large_files_20200310_2_sorted.txt

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:09:53Z] <jeh> truncated very large fastcgi.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:11:37Z] <jeh> truncated uwsgi.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:12:42Z] <jeh> truncated access.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:13:36Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:15:17Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:15:51Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:17:30Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:19:39Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:23:55Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:25:20Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:27:44Z] <jeh> truncated error.log T247315

Mentioned in SAL (#wikimedia-cloud) [2020-03-11T19:28:57Z] <jeh> truncated access.log T247315

JHedden renamed this task from 2019-03-10: tools and misc nfs share cleanup to 2020-03-10: tools and misc nfs share cleanup.Mar 12 2020, 3:51 PM

Mentioned in SAL (#wikimedia-cloud) [2020-03-20T16:04:51Z] <jeh> truncate jembot/error.log T247315

JHedden closed this task as Resolved.Mar 30 2020, 2:50 PM

Tools project and misc NFS storage usage is down to 79%