Page MenuHomePhabricator

Check whether dumps project requires NFS or not
Open, LowPublic

Description

This project has an entry in modules/labstore/files/nfs-mounts.yaml but no task under T102240

Event Timeline

Krenair created this task.Oct 31 2018, 1:10 PM

dumps: true
home: true
project: true
scratch: true

I assume it needs dumps, what about the others?

Hydriz added a comment.Nov 1 2018, 3:24 PM

Yes, and I foresee that we will need NFS until at least T159930: Create custom instance flavor for Dumps project is fixed, which won't be soon I suppose.

Well the migrations are in the process of happening. I don't know when the dumps project is scheduled but it sounds like that will be unblocked soon.

Andrew claimed this task.Feb 11 2020, 5:11 PM

Thank you Andrew for looking into this. To be clear, we still need to have 500 GB space on some mount. The reason is that, every now and then, we copy stuff from various places and assemble it in some large dataset which then gets uploaded as a whole to the Internet Archive.

We usually try to avoid uploading items bigger than 400-500 GB so that's the reason this size request was chosen. It's not always possible to break down datasets into smaller chunks, or at least not without downloading and processing them as a whole locally, which still requires a copy of that size. In ideal world, we could make other tasks easier by having even more space, for instance 1 TB: there are some wikis out there whose XML files easily go into the hundreds of GB; we can always try to be smarter about disk usage, but that requires to invest time in coding (and testing) which is worth more than the savings in resources.

In practice, when we didn't have this kind of space, or when we had it but the I/O performance was so poor that in practice it was unusable, we stopped doing some job altogether. Some tasks are backlogged by several years chiefly for this reason. I sometimes end up buying a commercial VPS with a few TB of disk from some commercial provider, with my own money, because it saves me so much time compared to trying Labs' limited and/or slow disks and to rewriting my code.

Your description sounds like a pretty good use case for 'scratch' -- is that what you're using now, or are you doing your work in /data/project? (It may be that scratch is too slow for this purpose, but it might be worth a try.)

Andrew triaged this task as Low priority.Feb 19 2020, 4:43 AM

@Nemo_bis *bump* can you respond to me most recent question?