Page MenuHomePhabricator

Create custom instance flavor for Dumps project
Closed, ResolvedPublic

Description

I am going to have to reduce the usage of NFS (see T134148 for related issues) on the Dumps project. For that to happen, I will need to have a custom instance flavor that provides more storage space on /srv. Here are the specifications:

  • Storage: 500 GB
  • RAM: 2 GB
  • CPU: 1

The space used on the storage should be less than 10 GB most of the time, but when datasets are actually being archived, it can go up to 300 GB or even more. Having 500 GB storage space would provide us with sufficient room for holding the files temporarily.

Event Timeline

Hydriz created this task.Mar 8 2017, 10:48 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 8 2017, 10:48 AM
bd808 moved this task from Triage to Backlog on the Cloud-Services board.Mar 26 2017, 9:00 PM
Hydriz added a subscriber: Andrew.May 14 2017, 10:35 AM
Harej added a subscriber: Harej.Mar 5 2018, 11:25 PM

I am treating this as a quota increase request. I am not sure that we want to have a general purpose 500 GB instance that can be deployed on demand. (And as with T95731 I think the solution long-term is going to be detachable block storage.)

Related - T174468.

I have managed to reduce the disk usage to less than 500G. However, the original problem still stands where the dumps project may have a very high utilization of disk space during certain periods of time which may negatively affect other CloudVPS projects. Is it possible for a separate labstore volume to be created just for the dumps project?

If the higher usage is periodic, I want to encourage setting up automatic clean up jobs after the dumps are processed. I'm inclined to decline this request since high disk usage is sporadic, and it's been over a year since it was filed.

If the higher usage is periodic, I want to encourage setting up automatic clean up jobs after the dumps are processed

I might have missed something, but this is not something that clean up would help. It's about transferring bigger datasets which are only produced occasionally. Clearly they can be split down in smaller pieces, but to do so we might end up increasing the usage of resources (download something, write it, read it, process and split it, write again, move elsewhere etc.).

If the higher usage is periodic, I want to encourage setting up automatic clean up jobs after the dumps are processed

I might have missed something, but this is not something that clean up would help. It's about transferring bigger datasets which are only produced occasionally. Clearly they can be split down in smaller pieces, but to do so we might end up increasing the usage of resources (download something, write it, read it, process and split it, write again, move elsewhere etc.).

Hi @Nemo_bis, I'm not sure I understand fully what you mean by "transferring bigger datasets" and "split down in smaller pieces". Could you elaborate what your flow looks like, how big are the datasets being downloaded/generated, and for how long you need to store them in the dumps project?

Nemo_bis added a comment.EditedMar 13 2018, 11:19 PM

Hi @Nemo_bis, I'm not sure I understand fully what you mean by "transferring bigger datasets" and "split down in smaller pieces". Could you elaborate what your flow looks like, how big are the datasets being downloaded/generated, and for how long you need to store them in the dumps project?

Hi, I'm not currently running any archival project, but most of them involve: a phase where data is downloaded or copied from the source; a phase where data is repackaged, e.g. compressed in archives of suitable sizes; a phase where it's uploaded to the Internet Archive. Individual Internet Archive items generally should be under ~400 GB.

Of course, the slower each phase is (e.g. for I/O or bandwidth limits), the longer will be the need to use disk space. Additionally, when the individual "package" of data was bigger than the available disk space, I often needed to download files to a location, then move them to another, then maybe uncompress and recompress the things in different places to avoid running out of disk space, then redo from scratch some downloads which had failed for lack of disk space, and so on. This often ended up creating a lot of additional I/O, compared to just writing everything in one place and then upload in one go.

Due to ongoing restructuring of our cloud setup, I'm reluctant to allocate new giant VMs right before I have to move them :) Can you ping me on this ticket (maybe in late August/early September) and/or when you have an urgent need to process a big job?

Thanks, and sorry for the delay.

bd808 closed this task as Declined.May 21 2019, 11:53 PM
bd808 added a subscriber: bd808.

Marking as declined because it has been 2 years since first requested and nearly one year since we asked to be told about an actual active need rather than a speculative one. Please reopen if and when there is an active need for a large local disk in the project.

Hydriz reopened this task as Open.May 22 2019, 12:52 AM

Something’s not right here, there has always been a need for large disks.

bd808 added a comment.May 22 2019, 1:24 AM

Something’s not right here, there has always been a need for large disks.

Due to ongoing restructuring of our cloud setup, I'm reluctant to allocate new giant VMs right before I have to move them :) Can you ping me on this ticket (maybe in late August/early September) and/or when you have an urgent need to process a big job?

Thanks, and sorry for the delay.

That comment by Andrew was written 2018-07-10 and there was never a ping for an urgent need to process a "big" job. This is why I closed the ticket for inactivity.

I think part of the problem here is that the Cloud Services team does not understand the dumps project and its needs clearly. This isn't helped by the project name which is not really descriptive of the purpose. There is however a reasonable amount of description at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dumps which is great. As I understand it you use this project to upload various Wikimedia datasets to the Internet Archive for archival purposes.

Your current ask is to have a custom instance flavor created for your project that gives you 3.125 times as much local disk space as a normal m1.xlarge instance gets. The more descriptive notes on T134148: Dumps instances occasionally hammer NFS for temporary storage seem to indicate that you would use this large disk instance to host your own NFS server. Is that correct? Would that instance be in addition to the 6 instances (with cryptic names) that are currently in use by the project or would it replace one or more of them? What will the impact to your project be if this instance type is granted, you put it into use, and then due to hardware failure or server maintenance issues the instance goes offline? Would you need to ensure that the contents of the 500G instance can be recovered, or would it typically be possible for you to replace it with a brand new instance with no disk contents?

bd808 triaged this task as Medium priority.May 22 2019, 1:31 AM

Currently, the storage space on this new instance flavor is for storing of files temporarily as it is getting uploaded to the Internet Archive. How it should work is that for certain datasets that are not currently available via NFS, we will download them, store them in the instance itself and then upload them to the Internet Archive. If we can do this in parallel (i.e. downloading and uploading), it would be even better, though it might cause quite a significant disk I/O, which we can fine tune at a later stage if necessary.

I do not intend to use this new type of instance as an NFS server. Instead, it should be replacing most of the existing instances in the project since we no longer need so much RAM/CPU, which the resources can then be freed up for other projects to use. All the processing (downloading/uploading) will only affect the instance itself and no other instances.

It is perfectly fine to have any hardware failures or server maintenance. The contents in this instance do not need to be backed up since it is just a copy of data that already exists elsewhere. This will make the current set up more resistant to failures or other disasters.

bd808 added a comment.May 22 2019, 4:29 PM

Thank you for the clarifications in T159930#5203036 @Hydriz. The cloud-services-team will review this request again in our 2019-05-28 team meeting.

Andrew added a comment.Jun 4 2019, 5:52 PM

Hello!

I've added a flavor to your project named 'dumps-temporary-file-storage' which has the specs you requested. A couple of caveats:

  1. Our copy-on-write filesystem is good at growing things as needed, but not so good at shrinking things afterwards. For that reason, it would be moderately useful if you delete/recreate your VMs that use this flavor when you have the opportunity.
  1. When it comes time to migrate VMs from host to host, probably we'll want to delete on the old host and recreate on the new rather than actually copy them, due to the large size.

I hope this gets you what you want! Sorry it took so long to get things moving.

Envlh added a subscriber: Envlh.Jun 10 2019, 8:21 PM
bd808 closed this task as Resolved.Jul 9 2019, 2:57 PM
bd808 assigned this task to Andrew.