Now that we run staged dumps, i.e. stubs for all wikis, then tables for all wikis etc., it takes a while for a full dump to complete. Folks will want the dump files available sooner rather than later so we should copy them sooner. This may need a rethinking of space available in labs, so we keep the last known full good dump, possibly a more recent partial dump, plus current files being copied over, per wiki.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | ArielGlenn | T107750 Make dumps run via cron on each snapshot host | |||
Resolved | ArielGlenn | T107757 staged dumps implementation | |||
Resolved | ArielGlenn | T108077 copy partial dumps from dataset host to labs |
Event Timeline
Coren, I've added you on this so we can chat about space available in labs for the dumps copy.
About 3x times one full run to be on the safe side. One run these days takes (guesstimate) 2.5T so we're looking at 8T to be safe. I forget what the last round of negotiations landed us with; what have we got allocated now?
That's... not an issue. :-) Since we moved to labstore1003, there is some 40T available for dumps (with the caveat that this lives on media that is not otherwise backed up or very redundant under the presumtion that it holds only copies of data).
changes to list-last-n-good-dumps coming up, yet to be tested. see https://gerrit.wikimedia.org/r/234973
tested and merged. https://gerrit.wikimedia.org/r/#/c/234982/ is the change to generate the list of last three good dumps and use that for rsync, also merged. we should see new behavior tomorrow as the new dump run starts.