Page MenuHomePhabricator

VPS Project dumps is using 2.4 TB at /data/project on NFS
Closed, ResolvedPublic

Description

With T159930 resolved, I imagine this can be cleaned up. The dumps project is consuming an order of magnitude more than any other project on it's NFS volume. Last time I understand there was a cleanup process broken, but that the process was supposed to be moved to instance storage anyway.

Event Timeline

Bstorm triaged this task as High priority.Jun 16 2020, 10:13 PM
Bstorm created this task.

Yes, I am still looking into this. Ideally I would be able to give an update somewhere in the fourth quarter of this year.

@Hydriz is there anything I can delete? This is using up nearly half the entire volume for VPS project NFS. The whole volume is around 5TB

bd808 added a subscriber: bd808.Jun 17 2020, 9:10 PM

Yes, I am still looking into this. Ideally I would be able to give an update somewhere in the fourth quarter of this year.

If this means October - December 2020, that is way too long to hold on to these scarce shared resources @Hydriz. 2.3Tb of the NFS usage is in a directory named "temp". Within this:

$ sudo du -sh *|sort -h
4.0K    mediacounts
4.0K    translation
45M     globalblocks
32G     categoriesrdf
1014G   cirrussearch
1.3T    wikidata

The cirrussearch directory contains 4408 dump files which are all from early 2018. The wikidata directory contains dump files from late 2018 and early 2019. I am not here to judge the intrinsic value of your project or to try and make you feel bad, but I am here to judge the opportunity cost for the Cloud VPS project of this shared storage issue. Your own documentation on sending things to Internet Archive says for them "[...] gigabytes fine, tens of gigabytes problematic, hundreds of gigabytes bad." The same really applies to Cloud VPS. You are beyond the hundreds of gigabytes bad and into the thousands of gigabytes causing real operational issues territory.

Unfortunately my time as a volunteer is scarce too. I am already rushing to clear this backlog of files as fast as I can and would have appreciated this grace period for me to get the project back in order. If there is really nothing that can be done, please feel free to delete the files and we can close this task.

Mentioned in SAL (#wikimedia-cloud) [2020-06-24T18:34:19Z] <bstorm> removing files from /data/project/dumps/temp/wikidata and /data/project/dumps/temp/cirrussearch T255628

Bstorm closed this task as Resolved.Jun 24 2020, 6:40 PM

I removed the majority of files from those two directories. The project is now 81G, which still places it high in the list of NFS usage. However, it is no longer an operational issue. I hope this doesn't cause trouble for your project, but the resource will now be in much better shape for all projects.