With T159930 resolved, I imagine this can be cleaned up. The dumps project is consuming an order of magnitude more than any other project on it's NFS volume. Last time I understand there was a cleanup process broken, but that the process was supposed to be moved to instance storage anyway.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Restricted Task | |||||
Resolved | Bstorm | T246122 Upgrade the Toolforge Kubernetes cluster to v1.16 | |||
Resolved | Bstorm | T211096 PAWS: Rebuild and upgrade Kubernetes | |||
Resolved | Bstorm | T167086 Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge | |||
Resolved | Bstorm | T160113 Move PAWS nfs onto its own share | |||
Resolved | yuvipanda | T105720 Labs team reliability goal for Q1 2015/16 | |||
Resolved | Andrew | T102240 Audit projects' use of NFS, and remove it where not necessary | |||
Declined | Andrew | T208402 Check whether dumps project requires NFS or not | |||
Resolved | Bstorm | T183920 2018-01-02: labstore Tools and Misc share very full | |||
Resolved | • madhuvishy | T174468 VPS Project dumps is using 1.7T at /data/project on NFS | |||
Resolved | Bstorm | T255628 VPS Project dumps is using 2.4 TB at /data/project on NFS |
Event Timeline
Yes, I am still looking into this. Ideally I would be able to give an update somewhere in the fourth quarter of this year.
@Hydriz is there anything I can delete? This is using up nearly half the entire volume for VPS project NFS. The whole volume is around 5TB
If this means October - December 2020, that is way too long to hold on to these scarce shared resources @Hydriz. 2.3Tb of the NFS usage is in a directory named "temp". Within this:
$ sudo du -sh *|sort -h 4.0K mediacounts 4.0K translation 45M globalblocks 32G categoriesrdf 1014G cirrussearch 1.3T wikidata
The cirrussearch directory contains 4408 dump files which are all from early 2018. The wikidata directory contains dump files from late 2018 and early 2019. I am not here to judge the intrinsic value of your project or to try and make you feel bad, but I am here to judge the opportunity cost for the Cloud VPS project of this shared storage issue. Your own documentation on sending things to Internet Archive says for them "[...] gigabytes fine, tens of gigabytes problematic, hundreds of gigabytes bad." The same really applies to Cloud VPS. You are beyond the hundreds of gigabytes bad and into the thousands of gigabytes causing real operational issues territory.
Unfortunately my time as a volunteer is scarce too. I am already rushing to clear this backlog of files as fast as I can and would have appreciated this grace period for me to get the project back in order. If there is really nothing that can be done, please feel free to delete the files and we can close this task.
Mentioned in SAL (#wikimedia-cloud) [2020-06-24T18:34:19Z] <bstorm> removing files from /data/project/dumps/temp/wikidata and /data/project/dumps/temp/cirrussearch T255628
I removed the majority of files from those two directories. The project is now 81G, which still places it high in the list of NFS usage. However, it is no longer an operational issue. I hope this doesn't cause trouble for your project, but the resource will now be in much better shape for all projects.