Page MenuHomePhabricator

staged dumps: use the "cutoff" option as little as possible
Closed, ResolvedPublic

Description

We want to be able to rerun the full dump scheduler from the beginning with no problems, having it skip over all jobs already completed.

The "cutoff" option doesn't fit this mold, when given to the bash worker script that dumps across all wikis, it checks if the wiki has had it status updated from the cutoff date or later, if not, it runs the job.

That cutoff date is set to "today" at the beginning of a run, and as runs take multiple days, the next run will have a different date, foiling our plan.

Event Timeline

ArielGlenn claimed this task.
ArielGlenn raised the priority of this task from to Medium.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added a project: acl*sre-team.
ArielGlenn subscribed.

Worker scripts now support a new job "createdirs" which just creates the new dump directory, sets up the status and index.html files, and exists. This can now be used as the first step of the stages run. Very unlikely for such a stage to break in the middle, and cleanup would be easy.

Later this first job should be pulled out to its own entry in a bash script when this all gets cronified. That way reruns of later stages when dumps break can just invoke the dump scheduler with a config file, one command and done.

changesets:
https://gerrit.wikimedia.org/r/#/c/233418/ which adds the createdirs job to all the stage lists, and rearranges the stages a bit
https://gerrit.wikimedia.org/r/#/c/233683/ which adds the creatdirs job functionality to the dump script

This setup is running now.

Issues:
the md5 sums file in the 'latest' directory for each wiki now gets removed. Don't want that.
the bash script that runs across all wikis waits 30 seconds between each wiki, which seems silly for the createdirs job.

https://gerrit.wikimedia.org/r/233953 and https://gerrit.wikimedia.org/r/233954 for preservation of the md5sum files from previous run and sleep time as a user-definable option

https://gerrit.wikimedia.org/r/#/c/233961/ for actual modification of the sleep time in the stages, no need to get it out for this run but when we do the next full run it should go out

this worked fine for the September run, closing.