Page MenuHomePhabricator

copy partial dumps from dataset host to labs
Closed, ResolvedPublic

Description

Now that we run staged dumps, i.e. stubs for all wikis, then tables for all wikis etc., it takes a while for a full dump to complete. Folks will want the dump files available sooner rather than later so we should copy them sooner. This may need a rethinking of space available in labs, so we keep the last known full good dump, possibly a more recent partial dump, plus current files being copied over, per wiki.

Event Timeline

ArielGlenn claimed this task.
ArielGlenn raised the priority of this task from to Medium.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added a project: acl*sre-team.
ArielGlenn added subscribers: ArielGlenn, coren.

Coren, I've added you on this so we can chat about space available in labs for the dumps copy.

Do you already have a ballpark of how much space you'd need?

About 3x times one full run to be on the safe side. One run these days takes (guesstimate) 2.5T so we're looking at 8T to be safe. I forget what the last round of negotiations landed us with; what have we got allocated now?

That's... not an issue. :-) Since we moved to labstore1003, there is some 40T available for dumps (with the caveat that this lives on media that is not otherwise backed up or very redundant under the presumtion that it holds only copies of data).

changes to list-last-n-good-dumps coming up, yet to be tested. see https://gerrit.wikimedia.org/r/234973

tested and merged. https://gerrit.wikimedia.org/r/#/c/234982/ is the change to generate the list of last three good dumps and use that for rsync, also merged. we should see new behavior tomorrow as the new dump run starts.

this is working now; closing.