Page MenuHomePhabricator

Restructure rsync of XML/SQL dumps and dumpsdata space/network/disk use
Open, MediumPublic


We have two new dumpsdata hosts to be deployed, and we're tight on space on the dumpsdata fallback host we have, since it gets rsyncs of XML/SQL and misc dumps. As more hosts do wikidata jobs after they are done with the regular jobs, we can expect pressue on disks and network to go up too. Additionally, the procedure for swapping a dumpsdata primary and secondary is very fragile. What can we do to fix this all?

An earlier version of this task was dropped when the issue it was trying to address (super slow rsyncs to public) was fixed: T254856

Event Timeline

ArielGlenn created this task.

Adding a complete problem description here, since one of the WMCS team has kindly consented to look at this problem with me.

The current setup:

Worker hosts generate dumps output files, reading and writing from an NFS share on a "dumpsdata" host. There is a secondary NFS dumpsdata host that can be swapped in if the primary dies. The secondary NFS host gets a copy of all output files from the primary NFS host via rsync. The primary NFS host also rsyncs all output files to the public web server (labstore box) and the WMCS NFS server for cloud instances (labstore box).

dumps-rsync-mess.png (432×757 px, 39 KB)

Some constraints:

  • We can't use all the bandwidth on the NFS server for the rsyncs, because we want the workers to reda and write their data. Those reads and writes should take priority over anything else.
  • The NFS server(s) have disk arrays with raid controllers but don't support iops for full use of 10G nics.
  • The NFS primary has a 10G nic but the secondary only has a 1G (and the port it's on is 1G also). The two new hosts have a shared 10G port between them.
  • We want the rsync to the NFS secondary to be very fast and frequent so that if there is a failure, we don't lose much as far as dump jobs go.
  • Index.html files and such are stashed before the rsync and copied over after the rsync of all dumps output files, so that downloaders never see a link to a file that's not there yet.
  • Space is getting tight on the secondary NFS server, because it also gets copies of "misc" dumps (everything not XML/SQL), since it is the fallback host for that as well.
  • The NFS secondary has 32G RAM, the primary and the misc dumps NFS server both have 64. The new boxes also have 64G, and they all have a single quad core cpu.
  • The fallback host with the 32G RAM and the 1G nic was racked in May 2017, so this time next year it will have been refreshed.
  • The labstore boxes getting these rsyncs don't have tons of extra bandwidth either. See T191491

Given that we want to minimize rsync bandwidth on the NFS primary while updating the NFS secondary as often and fast as possible, I would prefer a model like

dumps-rsync-mess-2.png (318×760 px, 35 KB)

We still need to avoid rsyncing index.html and similar files until the end, from the NFS secondary to the public servers.

Now we have space issues, and these two new boxes. The two wikis that take the most space and time are enwiki and wikidatawiki. We could have two pairs of NFS servers for XMl/SQL dumps, where the enwiki and wikidatawiki jobs get written to the NFS primary in one pair and the rest of the wiki output files get written to the NFS primary in the other pair. Because a given worker may run either sort of job, it will need to have mounts for both pairs... This means separate config files or extra settings or whatever, more complication.

dumps-rsync-mess-3.png (504×760 px, 66 KB)

One of the secondary NFS servers will get the rsync of the "misc" dumps as well, in case the misc dumpsdata NFS server dies. But if that happens, the given secondary NFS server will have to be pulled out of service of the pair and be the misc dumps NFS primary. And then we have to redo the rsync again.

Lots of moving parts. Ugh.

Adding one more comment and then stopping before ethis gets too long for anyone to read.

One more option is the following.

To understand it, you need to know that the dumpsdata1001,2 hosts will be refreshed with hosts with more space, sometime in this discal year.
Dumpsdata1002 is currently the nfs share for misc dumps. Dumpsdata1003 is the nfs share for xml/sql dumps, and dumpsdata1001 is the xml/sql dujmps
nfs fallback. The new undeployed hosts with twice the space are dumpsdata1004 and 5, and they also have much more space than the current hosts.

We currently have

XML workers <----------> dumpsdata1003 >------>            ---------- dumpsdata1001 (xml/sql nfs fallback)
                                               \           |                           |
                                                ---------> |--------- labstore1006 (web or nfs)
                                               /           |
misc dumps workers <---> dumpsdata1002 >------>            ---------- labstore1007 (web or nfs)

that is, the nfs primary servers rsync out to sll three hosts on the right one at a time. If any dumpsdata host dies, we can swap dumpsdata1001 in
for it.

We could have

XML workers <----------> dumpsdata1005 >------>                                  ---------- dumpsdata1001 (xml/sql nfs fallback)
                                               \                                 |
                                                ---------> dumpsdata1004 ------> |--------- labstore1006 (web or nfs)
                                               /                                 |
misc dumps workers <---> dumpsdata1002 >------>                                  ---------- labstore1007 (web or nfs)

Dumpsdata1004 will have the space, and so will dumpsdata1005. The spare would be dumpsdata1001. It would not have enough space if one of the
new dumpsdata hosts (1004,5) had a problem, but we can hope that won't happen until 1001 is refreshed with a host with new space.

This has the virtue of having a true spare, as well as one nfs mount for all xml/sql dumps.

In the meantime we have two more workers to deploy, intending for them to go into the XML/SQL dumps pool. How many more can the disk arrays

The misc dumps don't take nearly as much space as the XML/SQL dumps. We're not using that space well in this or any other setup.

Note that we use NFS v3 with cache off because there were race conditions with it on a few years ago. Things could be better now, hard to know.