Page MenuHomePhabricator

Big usage on nfs-tools-project.svg.eqiad.wmnet
Open, LowPublic

Description

zoranzoki21@tools-sgebastion-07:/home$ df -h
Filesystem                                                Size  Used Avail Use% Mounted on
udev                                                      7.9G     0  7.9G   0% /dev
tmpfs                                                     1.6G  180M  1.4G  12% /run
/dev/vda3                                                  19G   15G  3.1G  83% /
tmpfs                                                     7.9G  300K  7.9G   1% /dev/shm
tmpfs                                                     5.0M     0  5.0M   0% /run/lock
tmpfs                                                     7.9G     0  7.9G   0% /sys/fs/cgroup
nfs-tools-project.svc.eqiad.wmnet:/project/tools/project  8.0T  6.2T  1.4T  82% /mnt/nfs/labstore-secondary-tools-project
labstore1006.wikimedia.org:/dumps                          98T   51T   43T  54% /mnt/nfs/dumps-labstore1006.wikimedia.org
nfs-tools-project.svc.eqiad.wmnet:/project/tools/home     8.0T  6.2T  1.4T  82% /mnt/nfs/labstore-secondary-tools-home
tmpfs                                                     1.6G     0  1.6G   0% /run/user/0
tmpfs                                                     1.6G     0  1.6G   0% /run/user/14222
tmpfs                                                     1.6G     0  1.6G   0% /run/user/12662
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2818
tmpfs                                                     1.6G     0  1.6G   0% /run/user/16388
tmpfs                                                     1.6G     0  1.6G   0% /run/user/17969
tmpfs                                                     1.6G     0  1.6G   0% /run/user/12290
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2671
tmpfs                                                     1.6G     0  1.6G   0% /run/user/1237
tmpfs                                                     1.6G     0  1.6G   0% /run/user/20622
tmpfs                                                     1.6G     0  1.6G   0% /run/user/13679
tmpfs                                                     1.6G     0  1.6G   0% /run/user/13778
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2133
tmpfs                                                     1.6G     0  1.6G   0% /run/user/545
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2179
tmpfs                                                     1.6G     0  1.6G   0% /run/user/11650
tmpfs                                                     1.6G     0  1.6G   0% /run/user/4270
tmpfs                                                     1.6G     0  1.6G   0% /run/user/3617
tmpfs                                                     1.6G     0  1.6G   0% /run/user/3522
tmpfs                                                     1.6G     0  1.6G   0% /run/user/3275
cloudstore1009.wikimedia.org:/scratch                     4.0T  2.5T  1.4T  64% /mnt/nfs/secondary-cloudstore1009.wikimedia.org-scratch
tmpfs                                                     1.6G     0  1.6G   0% /run/user/11785
tmpfs                                                     1.6G     0  1.6G   0% /run/user/21804
tmpfs                                                     1.6G     0  1.6G   0% /run/user/10394
tmpfs                                                     1.6G     0  1.6G   0% /run/user/12080
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2484
tmpfs                                                     1.6G     0  1.6G   0% /run/user/18256
tmpfs                                                     1.6G     0  1.6G   0% /run/user/16088
tmpfs                                                     1.6G     0  1.6G   0% /run/user/4448
tmpfs                                                     1.6G     0  1.6G   0% /run/user/11396
tmpfs                                                     1.6G     0  1.6G   0% /run/user/10548
tmpfs                                                     1.6G     0  1.6G   0% /run/user/3067
tmpfs                                                     1.6G     0  1.6G   0% /run/user/17980
cloudstore1008.wikimedia.org:/scratch                     4.0T  2.5T  1.4T  64% /mnt/nfs/secondary-cloudstore1008.wikimedia.org-scratch
tmpfs                                                     1.6G     0  1.6G   0% /run/user/14826
tmpfs                                                     1.6G     0  1.6G   0% /run/user/13367
tmpfs                                                     1.6G     0  1.6G   0% /run/user/20061
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2909
tmpfs                                                     1.6G     0  1.6G   0% /run/user/17234
tmpfs                                                     1.6G     0  1.6G   0% /run/user/3794
tmpfs                                                     1.6G     0  1.6G   0% /run/user/4010
tmpfs                                                     1.6G     0  1.6G   0% /run/user/13559
tmpfs                                                     1.6G     0  1.6G   0% /run/user/2111
tmpfs                                                     1.6G     0  1.6G   0% /run/user/17142
zoranzoki21@tools-sgebastion-07:/home$

I don't know is 82% usage ok for nfs-tools-project.svc.eqiad.wmnet:/project/tools/project and nfs-tools-project.svc.eqiad.wmnet:/project/tools/home

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptThu, Sep 26, 6:53 PM
aborrero triaged this task as Low priority.Fri, Sep 27, 9:15 AM
aborrero added a subscriber: aborrero.

80% usage is fine in the sense that we still have 1.4T of room for storage there.
We periodically try to do some cleanup on the NFS shared servers, especially for big files.

Thanks for the report! I just realized we may not have proper grafana dashboards for monitoring this.

See T233120: 2019-09-17: tools share cleanup (high usage) for the cleanup effort led by @Phamhi that very recently brought usage down from 93% to 80% on this set of shares. We get automated alerts from icinga when utilization rises above 90%. https://grafana.wikimedia.org/d/000000338/labstore-nfs-directory-sizes?orgId=1 used to show utilization over time, but it looks like that dashboard has not been updated to follow the metrics which moved from Graphite to Prometheus.

I made a new dashboard at https://grafana.wikimedia.org/d/PAUSVCtWk/cloud-nfs-utilization?orgId=1 that will show the trends in NFS share utilization. We are not exporting data to Prometheus at the granularity that we were for Graphite, so I was not yet able to figure out how to make a "top sub-directory" report like the labstore-nfs-directory-sizes dashboard once showed.