Page MenuHomePhabricator

2021-06-15: Tools NFS share cleanup
Open, HighPublic

Description

In the tradition of T276525

We are back to do more share cleanup. This time, I'm making a task to retrofit quotas on the NFS now that other problems are largely fixed. This procedure is happening too often.

Icinga has alerted, and I must answer.

Current df output:
/dev/drbd4 8.0T 6.4T 1.3T 85% /srv/tools

Docs https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#NFS_volume_cleanup
Most useful dashboard https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1

Event Timeline

Bstorm moved this task from Backlog to Shared Storage on the Data-Services board.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm updated Other Assignee, added: nskaggs.
Bstorm added a subscriber: nskaggs.

Adding @nskaggs in case he wants to pursue some of the subtasks as the clinic duty.

Bstorm triaged this task as High priority.Jun 14 2021, 11:53 PM
Bstorm updated the task description. (Show Details)

Running the large-files find in a screen session.

Mentioned in SAL (#wikimedia-cloud) [2021-06-15T01:18:13Z] <bstorm> running a modified version of the prometheus dir size cron in screen T284964

Mentioned in SAL (#wikimedia-cloud) [2021-06-15T15:54:43Z] <bstorm> truncated 42GB virgule.err file T284964

Mentioned in SAL (#wikimedia-cloud) [2021-06-15T16:08:08Z] <bstorm> truncated 28GB person_bkl2.out T284964

Mentioned in SAL (#wikimedia-cloud) [2021-06-15T16:31:20Z] <bstorm> truncated 26GB error.log T284964

Change 699973 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] nfs prometheus: change to strings for dir sizes

https://gerrit.wikimedia.org/r/699973

I ran some of our prometheus stuff in a modified fashion to find out what is actually the biggest users because it seems like there's no way the ones identified so far are taking all this up. Unfortunately, that was correct. Here are the largest projects:
templatehoard 93.52 GB
phetools 101.71 GB
panoviewer 110.16 GB
wikidata-analysis 113.03 GB
yifeibot 128.91 GB
videoconvert 140.03 GB
zoomviewer 174.70 GB
glamtools 182.25 GB
ia-upload 290.00 GB
wdumps 310.95 GB

Change 699973 merged by Bstorm:

[operations/puppet@production] nfs prometheus: change to strings for dir sizes

https://gerrit.wikimedia.org/r/699973