So people can't accidentally fill up tools by filling up PAWS
|nfs monitoring: fix the broken paths for the directory size monitor||operations/puppet||production||+10 -16|
|Resolved||Bstorm||T246122 Upgrade the Toolforge Kubernetes cluster to v1.16|
|Resolved||Bstorm||T211096 PAWS: Rebuild and upgrade Kubernetes|
|Resolved||Bstorm||T167086 Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge|
|Resolved||Bstorm||T160113 Move PAWS nfs onto its own share|
|Resolved||Bstorm||T255628 VPS Project dumps is using 2.4 TB at /data/project on NFS|
Apr 5 19:12:11 labstore1005 nfs-exportd: exportfs: Failed to stat /exp/project/paws: No such file or directory
Just a note that the mountpoint isn't on the standby yet, until I get around to creating it :)
This is now fixed by virtue of the bindmounts no longer existing since https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821
We need to sync the paws-user-homes and related materials to the paws project when things are ready to cut over.
Beyond the user homes, actually, it should be noted that we have:
paws/ paws-beta/ paws-dev/ paws-public/ paws-published/ paws-stats/ paws-status/ paws-support/
User homes are at labstore100:/srv/tools/shared/tools/project/paws/userhomes/
The failsafe jupyterhub sqlite file is in labstore100:/srv/tools/shared/tools/project/paws/db/ and needs a persistent volume created in the cluster for it.
paws-beta isn't used apparently. paws-dev appears to be historical. paws-published seems aspirational. paws-public is the homedir of the paws-public tool. This mostly holds the required yaml manifest and readme to launch paws-public. That needs to go in the GitHub repo.
paws-status seems to have been a planned app that isn't there. On the other hand paws-support actually does host something: https://tools.wmflabs.org/paws-support/ There is nothing about that which needs to move with paws from what I can tell. It's just to get a package online.
So right now, paws/usershomes is 218G.
/srv/misc has 996G available (good), but that means it is at 80% use (not so good). Copying them over shouldn't move that needle much, but I'm now wondering if there is anything we can clean up first.
Sadly I think this is the same classic problem as the Toolforge NFS share. No hard quotas and no automatic purge means that files "leak" and are never recovered. There are currently 2823 distinct user directories and within them 1.3M files with an mtime of 365 days or more. For directory size, Matias zapata seems to be the 'winner' with 58G in their home which all seems to be due to a copy of the 2020-02-20 enwiki article dump stored as both a 16G compressed file and a 42G expansion of that same file.
After fixing and running the prometheus monitor, the top projects are:
dumps project has 2.4 TB
That is wildly more than any other project on misc. I seem to recall that project has some kind of cached files or something that need regular cleanup.