Page MenuHomePhabricator

Setup periodic rsync jobs from dumps generation hosts to labstore1006|7
Closed, ResolvedPublic

Description

Details of this task are yet to be fully worked out. There is work in progress on the Dumps server side to make rsyncs incremental such that we can run them more frequently in the order of a few minutes. When that is done, dumpsdata1002 will be able to push to labstore1006 and 7 alternatively new data every 10 minutes or so.

While that's in progress, there may be intermediate steps where we rsync over data from dataset1001, or may be labstore1003?

Related Objects

StatusSubtypeAssignedTask
Resolvedbd808
ResolvedArielGlenn
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
ResolvedArielGlenn
ResolvedArielGlenn
Resolved ezachte
ResolvedArielGlenn
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy

Event Timeline

My thoughts on this task:

  • If you need the data to be available right away so you can start other work, you could rsync from labstore1003. Doing so more often than every few hours won't gain you much, because we're not doing rolling rsyncs to it right now.
  • If you don't mind the data being a bit older, we could rsync to the labstore host(s) of your choice from ms1001, until the dumpdata hosts are ready.
  • Once the dumpdata hosts are ready, we can rsync to the labstore host of your choice from the fallback dumpdata host.

Rsyncs may make longer than ten minutes, depending on the dump files completed since the last rsync; however, no new rsync will run til the current one completes. Depending on how long they take, it may make more sense to rsync the labstore fallback host from the current active host, rather than waiting on the dumpsdata host to do serial rsyncs. We'll see in practice how it works out.

bd808 moved this task from Backlog to Dumps on the Data-Services board.Sep 11 2017, 9:59 PM

I will be ready to rsync xml/sql data from the dumpsdata1001 host this week to labstore1006, if you are in a position to accept it :-)

You should first have picked up a full copy of existing data from someplace, we can arrange for that to come from ms1001 if that works for you, or from labstore1003 as described earlier.

I don't want to push out to labstore1007 at least at the moment, because that would be 4 rsyncs from the host where generated dumps are written, with some overlapping in order to get rsyncs to my fallback server done in a timely fashion. And I want to avoid that overlap.

How do things look on your end for this?

Change 390075 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] allow labstore1006 to rsync from dumps servers

https://gerrit.wikimedia.org/r/390075

Change 390075 merged by ArielGlenn:
[operations/puppet@production] allow labstore1006 to rsync from dumps servers

https://gerrit.wikimedia.org/r/390075

Change 391222 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] move hardcoded paths out of dump rsync server manifests

https://gerrit.wikimedia.org/r/391222

Change 391222 merged by ArielGlenn:
[operations/puppet@production] move hardcoded paths out of dump rsync server manifests

https://gerrit.wikimedia.org/r/391222

Change 391242 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add dumps rsync user and path variables to hiera

https://gerrit.wikimedia.org/r/391242

Change 391242 merged by ArielGlenn:
[operations/puppet@production] add dumps rsync user and path variables to hiera

https://gerrit.wikimedia.org/r/391242

Change 391263 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] enable dump rsyncs to/from labstore1006

https://gerrit.wikimedia.org/r/391263

I have a few questions, left on the patchset https://gerrit.wikimedia.org/r/#/c/391263/3

OK, I delcare the patch ready to merge, as soon as the following happen on labstore1006:

  • a new directory /srv/dumps/xmldatadumps created with owner/group root and 755 perms
  • move all directories and files under /srv/dumps, to /srv/dumps/xmldatadumps

Then I will merge this, run rsync from dumpsdata1001 to labstore1006 by hand to make sure it runs, and finally add the labstore1006 host and path to the corresponding rsync job in puppet. This will send over only xml/sql dumps for now.

Change 391892 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] public_dumps: Define directory for xmldatadumps

https://gerrit.wikimedia.org/r/391892

Change 391892 merged by Madhuvishy:
[operations/puppet@production] public_dumps: Define directory for xmldatadumps

https://gerrit.wikimedia.org/r/391892

OK, I delcare the patch ready to merge, as soon as the following happen on labstore1006:

  • a new directory /srv/dumps/xmldatadumps created with owner/group root and 755 perms
  • move all directories and files under /srv/dumps, to /srv/dumps/xmldatadumps

This is all done now :)

Then I will merge this, run rsync from dumpsdata1001 to labstore1006 by hand to make sure it runs, and finally add the labstore1006 host and path to the corresponding rsync job in puppet. This will send over only xml/sql dumps for now.

Change 391263 merged by ArielGlenn:
[operations/puppet@production] enable dump rsyncs to/from labstore1006

https://gerrit.wikimedia.org/r/391263

Change 391905 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add labstore1006 to list of hosts for rolling rsync of xml/sql dumps

https://gerrit.wikimedia.org/r/391905

Change 391905 merged by ArielGlenn:
[operations/puppet@production] add labstore1006 to list of hosts for rolling rsync of xml/sql dumps

https://gerrit.wikimedia.org/r/391905

These have been running for awhile now. The only thing that doesn't get synced over on a regular basis are the various datasets pulled or pushed onto dataset1001 from kiwix, mwlog hosts, etc. Instead of setting up an additional sync job for those, we ought to just enable those syncs to happen on labstore1006 and sync from there to 1007.

  • profile::dumps::fetcher with appropriate hiera settings and permissions on stat1005 will take care of the incoming datasets
  • profile/manifests/phabricator/main.pp has a stanza for the push to dataset1001, so it should get a new stanza added, or convert this to pull
  • role/manifests/logging/mediawiki/udp2log.pp has a stanza for push to dumps.wikimedia.org, so it should get a new stanza added, or convert this to pull

Then this task could be closed.

madhuvishy renamed this task from Setup periodic rsync jobs from dataset1001/dumpsdata1001|2 to labstore1006|7 to Setup periodic rsync jobs from dumps generation hosts to labstore1006|7.Mar 5 2018, 5:57 PM
madhuvishy closed this task as Resolved.
madhuvishy assigned this task to ArielGlenn.

These have been running for awhile now. The only thing that doesn't get synced over on a regular basis are the various datasets pulled or pushed onto dataset1001 from kiwix, mwlog hosts, etc. Instead of setting up an additional sync job for those, we ought to just enable those syncs to happen on labstore1006 and sync from there to 1007.

  • profile::dumps::fetcher with appropriate hiera settings and permissions on stat1005 will take care of the incoming datasets
  • profile/manifests/phabricator/main.pp has a stanza for the push to dataset1001, so it should get a new stanza added, or convert this to pull
  • role/manifests/logging/mediawiki/udp2log.pp has a stanza for push to dumps.wikimedia.org, so it should get a new stanza added, or convert this to pull

Then this task could be closed.

Renaming this task and calling it done, and opening a new task for the above.

See T188726 for new task on datasets in other/

Change 423354 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Add labstore1007 to list of hosts for rolling rsync

https://gerrit.wikimedia.org/r/423354

Change 423354 merged by ArielGlenn:
[operations/puppet@production] dumps: Add labstore1007 to list of hosts for rolling rsync

https://gerrit.wikimedia.org/r/423354