Page MenuHomePhabricator

Sync data for tools-project from labstore1001 to labstore1004/5
Closed, ResolvedPublic

Description

As part of setting up labstore1004/5 as a HA setup for labstore, we need to sync over the data for tools-project from labstore1001. The data is available in the nfs share mounted on /srv/project/tools.

Event Timeline

madhuvishy renamed this task from Migrate tools-project and others(Labs) data from labstore1001 to labstore1004/5 to Sync data for tools-project from labstore1001 to labstore1004/5.Oct 28 2016, 3:07 PM
madhuvishy updated the task description. (Show Details)
chasemp added a comment.EditedOct 28 2016, 3:19 PM

We have had some false starts with rsync choking on huge amounts of files in certain directories and then also a few seemingly corrupt files. We could run this with a flag to ignore read errors but that seems fraught with peril.

I decided to wipe out tools data on the active node labstore1004 and restart in a consistent fashion working through anomalies.

Command in use:

rsync --rsh 'ssh -i /root/.ssh/id_labstore'   \
    --quiet \
    --archive   \
    --compress   \
    --progress   \
    --human-readable \
    --hard-links   \
    --delete-during \
    --force \
    --max-size=10G \
    --bwlimit=250000   \
    --exclude-from=/root/rsync_tools_exclude.txt \
    /srv/backup-tools/*   \
    root@labstore1004.eqiad.wmnet:/srv/tools/shared/tools/

where /root/rsync_tools_exclude.txt is:

project/ifttt/www/python/src/cache/*

Currently excluding >10G of which there are:

find /srv/project/tools/ -type f -size +10G

https://phabricator.wikimedia.org/P4320

And of these some look disposable:

grep -e \.log -e \.error -e \.err -e debug\.txt -e \.out$ /tmp/gt10g

11G        /srv/project/tools/project/merlbot/AuszeichnungsKategorieFehlt_weekly.out
11G        /srv/project/tools/project/osm4wiki/error.log
12G        /srv/project/tools/project/wiwosm/access.log
14G        /srv/project/tools/project/kenrick95bot/kenrick95bot-welcome.err
15G        /srv/project/tools/project/rubinbot2/refs.err
15G        /srv/project/tools/project/shuaib-bot/zumranew.err
19G        /srv/project/tools/project/geohack/access.log
19G        /srv/project/tools/project/ifttt/www/python/src/ifttt.log
25G        /srv/project/tools/project/whymbot/enwikt.err
35G        /srv/project/tools/project/wikivoyage/access.log
38G        /srv/project/tools/project/persondata/templatedata/debug.txt
42G        /srv/project/tools/project/ifttt/uwsgi.log
42G        /srv/project/tools/project/osm/access.log
46G        /srv/project/tools/project/wiwosm/error.log
chasemp added a comment.EditedOct 28 2016, 3:23 PM

@madhuvishy thoughts on truncating the disposable >10G files and kicking off an update of rsync over the weekend w/ the tree largest excluded for now:

48G /srv/project/tools/project/splinetools/dumps/enwiki-20141106-pages-articles.xml
52G /srv/project/tools/project/toolserver-home-archive/archive-2014-06-05.tar.xz
75G /srv/project/tools/project/oar/repository_text_2014-06-13.tar.gz

Started another sync now after truncating the >10G error/access log files from the above comment. New command (no >10G exclusion):

rsync --rsh 'ssh -i /root/.ssh/id_labstore'   \
    --quiet \
    --archive   \
    --compress   \
    --progress   \
    --human-readable \
    --hard-links   \
    --delete-during \
    --force \
    --bwlimit=250000   \
    --exclude-from=/root/rsync_tools_exclude.txt \
    /srv/backup-tools/*   \
    root@labstore1004.eqiad.wmnet:/srv/tools/shared/tools/

rsync_tools_exclude.txt is now:

project/ifttt/www/python/src/cache/*
project/splinetools/dumps/enwiki-20141106-pages-articles.xml
project/toolserver-home-archive/archive-2014-06-05.tar.xz
project/oar/repository_text_2014-06-13.tar.gz
chasemp closed this task as Resolved.Nov 16 2016, 3:30 PM

This was done on sunday for a sync within 24 hours of main maint for Tools. The actual outage period sync took around 5h for last batch of data.

Final sync options:

rsync --rsh 'ssh -C -i /root/.ssh/id_labstore'   \
    --archive   \
    --progress   \
    --quiet \
    --human-readable \
    --hard-links   \
    --delete-during \
    --inplace \
    --safe-links \
    --executability \
    --timeout=30 \
    --force \
    --bwlimit=250000   \
    --exclude-from=/root/rsync_tools_exclude.txt \
    /srv/backup-tools/*   \
    root@labstore1004.eqiad.wmnet:/srv/tools/shared/tools/