Page MenuHomePhabricator

Investigate and reduce resource use by rsync of dumps between peers, labs, mirrors
Open, MediumPublic

Description

I recall folks being concerned that rsyncs might contribute to memory pressure on dataset1001, increasing the possibility of NFS lockup discussed in T169680.
In any case, if there are easy things to do that can reduce resource use, we should do them.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 12 2017, 9:10 AM
ArielGlenn triaged this task as Medium priority.Oct 12 2017, 9:10 AM

One obvious fix is to avoid copying over files that are still being written. We can now easily tell which ones those are, at least for the regular xml/sql dump runs. This patch implements it for that case: https://gerrit.wikimedia.org/r/#/c/385203/

I did some research earlier and looked at the rsync code; versions 3.0.0 and greater create the file list incrementally, which uses much less memory than the older versions. Anything running precise and up will have 3.0.0, so we can practically rule out use of older versions by our mirrors. I tried checking the filecount lookahead and some other details, but tl;dr is that I still wonder if doing smaller subdirs at a time would be less resource-intensive. Needs some testing.

Setup time before file list transmission seems nearly the same for top level directories and subdirs, tested on dataset1001 which has a very large filesystem. Things yet to be tested; making the include/exclude list less complex or shorter; using separate rsync stanzas for subdirectories to see if that's faster.

We're in pretty good shape now, rsycning only complete files, as soon as they are produced, and with less populated filesystems on the dumpsdata server side. Subdirectory rsyncing didn't help any. Steps forward should now be coordinated with @Bstorm to see what might be done on the labstore side of these rsyncs.

Aklapper removed ArielGlenn as the assignee of this task.Jun 19 2020, 4:20 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)