Page MenuHomePhabricator

VPS Project dumps is using 1.7T at /data/project on NFS
Closed, ResolvedPublic

Description

(cc @Hydriz @Nemo_bis)

The dumps project is using 1.7T / 5T available shared NFS storage for Cloud VPS projects. This is really high, please consider cleaning up old files, and instituting crons or other ongoing processes to clean up old logs, generated data files, etc. Thank you!

(Subscribing project admins as listed in https://tools.wmflabs.org/openstack-browser/project/dumps)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2017, 5:47 PM
madhuvishy triaged this task as Medium priority.Aug 29 2017, 5:48 PM

Thank you for the notice. This issue was caused by an archiving job for the CirrusSearch dumps which did not exit cleanly. I am in the midst of resolving this issue and will update once the storage space goes down.

Is there something we could do to make it more clear when the dumps are succesfull or not, perhaps emitting some sort of tag file next to the dump when its succesfull?

@EBernhardson Having a status.txt file with a simple text of "done" or "in progress" would be very ideal for us to know if a dump is successfully completed. If you think this is a great idea, I would suggest working on this on a separate task. :)

The issue mentioned in this task happened due to problems with the connectivity between Wikimedia and the Internet Archive during the archiving process, and my archiving infrastructure is not yet able to handle such issues yet, hence the piling up of files and the space usage issue. I can only increase monitoring of storage space usage until I manage to solve this issue in the software itself.

Thanks madhuvishy for the notification. Hydriz, how long will it take to recover the backlog? If bandwidth is the bottleneck, do we need our bandwidth cap to be temporarily lifted?

I should be able to free up at least 1 TB by the end of this week, which is about 3 of the CirrusSearch dumps. The remaining 2 dumps will take a while to complete, so the storage space will be used for a while longer.

Bandwidth is indeed the bottleneck, but based on my observation, the instances doesn't seem to be able to support any faster bandwidth, so I'm okay with not having the bandwidth cap lifted (I'm surprised there was one in the first place).

Thanks @Hydriz @Nemo_bis! Do keep the ticket updated as the storage space gets freed up.

madhuvishy added a subscriber: chasemp.

We are at high utilization by the dumps project again, 2T or 5T available storage. Please cleanup excess files and data soon, thank you!

I have managed to reduce the disk usage to less than 500G. However, the original problem still stands where the dumps project may have a very high utilization of disk space during certain periods of time which may negatively affect other CloudVPS projects. Is it possible for a separate labstore volume to be created just for the dumps project?

Expect slightly higher-than-usual storage usage over the next couple of weeks as I am trying to clear the backlog of CirrusSearch dumps to archive. If all goes well, the maximum utilization should be less than 1 TB.

madhuvishy closed this task as Resolved.Mar 13 2018, 4:31 PM
madhuvishy claimed this task.

Resolving this for now. This project still has high utilization, albeit less than before. We can discuss strategies to mitigate in T159930.