|Resolved||yuvipanda||T105720 Labs team reliability goal for Q1 2015/16|
|Resolved||Andrew||T102240 Audit projects' use of NFS, and remove it where not necessary|
|Declined||Andrew||T208402 Check whether dumps project requires NFS or not|
|Resolved||Bstorm||T255628 VPS Project dumps is using 2.4 TB at /data/project on NFS|
Thank you Andrew for looking into this. To be clear, we still need to have 500 GB space on some mount. The reason is that, every now and then, we copy stuff from various places and assemble it in some large dataset which then gets uploaded as a whole to the Internet Archive.
We usually try to avoid uploading items bigger than 400-500 GB so that's the reason this size request was chosen. It's not always possible to break down datasets into smaller chunks, or at least not without downloading and processing them as a whole locally, which still requires a copy of that size. In ideal world, we could make other tasks easier by having even more space, for instance 1 TB: there are some wikis out there whose XML files easily go into the hundreds of GB; we can always try to be smarter about disk usage, but that requires to invest time in coding (and testing) which is worth more than the savings in resources.
In practice, when we didn't have this kind of space, or when we had it but the I/O performance was so poor that in practice it was unusable, we stopped doing some job altogether. Some tasks are backlogged by several years chiefly for this reason. I sometimes end up buying a commercial VPS with a few TB of disk from some commercial provider, with my own money, because it saves me so much time compared to trying Labs' limited and/or slow disks and to rewriting my code.
Your description sounds like a pretty good use case for 'scratch' -- is that what you're using now, or are you doing your work in /data/project? (It may be that scratch is too slow for this purpose, but it might be worth a try.)