Page MenuHomePhabricator

clean up old shorturl dumps
Closed, ResolvedPublic

Description

Right now we keep them forever, and we shouldn't.

Event Timeline

ArielGlenn triaged this task as Medium priority.Jul 13 2020, 7:23 AM
ArielGlenn created this task.

This is blocked until Legoktm can archive url counts from the older dumps, for some stats he generates.

Everything is done on my side for https://shorturls.toolforge.org/. Still waiting to hear how long https://github.com/Hydriz/Balchivist/issues/8 is expected to take...if it's going to be a while I'll set up a manual cronjob in the meantime.

Hydriz subscribed.

The actual system will take a while but let me try to manually get the old files uploaded first, which should take about 2-3 weeks. Approximately how many old copies of this dump will we be keeping, if we are not keeping all of them here?

Unsure, and it may vary over time. I'm going to arbitrarily say 20 for now, that seems like a lot.

Just an update that I have got the files archived to the Internet Archive: https://archive.org/search.php?query=subject%3A%22shorturls%22%20AND%20subject%3A%22wikimedia%22

Next step for me is to probably get a cronjob running, but I will be mainly focusing on getting the new system up since it is designed for such cases.

@ArielGlenn You can go ahead to keep the latest 20 dumps (or less, depending on your requirements).

Funnily enough we are already configured to keep only 7 shorturl dumps, so it is just "lucky" that the script did not work for the specific directory layout. I'll fix that now though :-)

Change 619571 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] cleanup misc dumps that aren't stored in per-date urls

https://gerrit.wikimedia.org/r/619571

Change 619571 merged by ArielGlenn:
[operations/puppet@production] cleanup misc dumps that aren't stored in per-date urls

https://gerrit.wikimedia.org/r/619571

The above patch is now deployed; I'll check tomorrow to make sure that the older files are actually cleaned up on the labstore hosts before closing the task.

To-morrow, and to-morrow, and to-morrow,
Creeps in this petty pace from day to day,
with nothing to remind me that I should check those files or close this bug.

So, almost a month later... yep the files are cleaned up, closing this task!