These are currently running jessie:
- dumpsdata1001.eqiad.wmnet
- dumpsdata1002.eqiad.wmnet
These are currently running jessie:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | MoritzMuehlenhoff | T224549 Track remaining jessie systems in production | |||
Resolved | ArielGlenn | T224563 Migrate dumpsdata hosts to Stretch/Buster | |||
Resolved | ArielGlenn | T219768 Get a third dumpsdata server | |||
Unknown Object (Task) | |||||
Resolved | ArielGlenn | T234076 (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet |
Plan for migration:
At this point we should have three hosts on buster, one doing misc crons, one as fallback for the xml/sql dumps, and one as primary for the xml/sql dumps.
If the misc crons host fails, the xml/sql fallback server should be able to be used for it without issues by applying the right role.
rsync -v labstore1006.wikimedia.org::data/xmldatadumps/public/rsync-inc-last-2.txt . rsync -av --include '/*wik*/' --include-from=rsync-inc-last-2.txt --exclude='*' labstore1006.wikimedia.org::data/xmldatadumps/public/ /data/xmldatadumps/public
This will bring over some older files from 2007 and 2009 but it's easier to clean those up later than try to get the rsync args right to exclude them.
The above rsync completed; I will be rerunning it from time to time. In the meantime I have now moved onto the 'misc' dumps:
rsync -av labstore1006.wikimedia.org::data/xmldatadumps/public/other/ /data/otherdumps
rsync -av --bwlimit=80000 dumpdata1002.eqiad.wmnet::data/otherdumps/ /data/otherdumps
I see that I did not bwlimit the labstore rsync, though in my earlier 20 attempts to get the rsync args right, I did have that in there. It will be limited for any catchup runs.
I have extended the rsync of xlm/sql dumps to the last three good dumps and have been running a bandwidth-limited pull from labstore1006 to dumpsdata1003 in a screen session on dumpsdata1003. I've periodically been updating the misc/other dumps via pull from dumpsdata1002.
Rync of both xmldatadumps/public and otherdumps from dumpsdata1002 to dumpsdata1003 is caught up as of earlier this evening. I'll be running these throughout the day tomorrow, waiting for the misc cron dumps to finish.
Change 551035 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make dumpsdata1003 another secondary dumps NFS server along with dumpsdata1002
Change 551035 merged by ArielGlenn:
[operations/puppet@production] make dumpsdata1003 another secondary dumps NFS server along with dumpsdata1002
Change 551038 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add per-host configs for new dumps fallback NFS server
Change 551038 merged by ArielGlenn:
[operations/puppet@production] add per-host configs for new dumps fallback NFS server
Change 551039 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] buster doesn't have mailx, replace with s-nail
Change 551039 merged by ArielGlenn:
[operations/puppet@production] buster doesn't have mailx, replace with s-nail
Change 551042 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] fix up dump stats script to use either mail or s-nail
Change 551042 merged by ArielGlenn:
[operations/puppet@production] fix up dump stats script to use either mail or s-nail
Change 551173 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make dumpsdata primary nfs server rsync to dumpsdata1003 now
Change 551173 merged by ArielGlenn:
[operations/puppet@production] make dumpsdata primary nfs server rsync to dumpsdata1003 now
dumpsdata1003 is now receiving all files from dumpsdata1001 via rsync. dumpsdata1002 can be turned into a spare and re-imaged with buster as the next step.
Change 551317 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make dumpsdata1002 spare before reimaging
Change 551317 merged by ArielGlenn:
[operations/puppet@production] make dumpsdata1002 spare before reimaging
Change 551319 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make dupsdata1002 install buster instead of jessie
Change 551319 merged by ArielGlenn:
[operations/puppet@production] make dumpsdata1002 install buster instead of jessie
Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:
['dumpsdata1002.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201911161927_ariel_118105.log.
Completed auto-reimage of hosts:
['dumpsdata1002.eqiad.wmnet']
and were ALL successful.
Expanded /data on dumpsdata1002, rsyncing copies of adds-changes dumps now from dumpsdata1003 in a screen session. After that I'll pick up the categoryrdf dumps, also via rsync from dumpsdata1003.
Change 551503 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add new partman recipe that skips format of /data partition for dumps servers
Change 551804 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] move misc crons to dumpsdata1002 nfs server
The schedule is now:
And then of course check that everything is running ok when xml dumps start on Dec 1st.
Change 551879 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add partman recipe that leaves /data on dump servers alone
Change 551879 abandoned by ArielGlenn:
add partman recipe that leaves /data on dump servers alone
Reason:
did testing without needing the commit.
Change 551503 merged by ArielGlenn:
[operations/puppet@production] add new partman recipe that skips format of /data partition for dumps servers
The patchset for tonight/tomorrow, moving misc cron storage to dumpsdata1002, is ready to go.
Given that the wikidata entity dumps are still finishing up the truthy gz files, and after that there will be bz2 recompression and the Lexemes, I'll be making the switchover tomorrow morning or mid-day EET.
Change 551804 merged by ArielGlenn:
[operations/puppet@production] move misc crons to dumpsdata1002 nfs server
snapshot1008 now uses dumpsdata1002 as its nfs server. I had to manually systemctl stop nfs-mountd.service and start it again for dumpsdata1002 to pick up the values (and especially the port setting) in /etc/default/nfs-kernel-server so that's poor. Other than that, no problems with puppet's unmounting and remounting of the share.
The next misc cron dump is already running (pagetitles) so I expect to see the files over on labstore1006,7 in a little while.
Adds-changes dumps did not run properly; when I checked this afternoon the Nov 23 job was hung indefinitely trying to get a lockfile on the first wiki to be processed (abwiki). I watched snapshot1008 attempt to connect to dumpsdata1002 for (some) nfs request and then try dumpsdata1003 when that failed (!) I rebooted snapshot1008 which no longer does this. Some port was still advertised wrongly on dumsdata1002 it seems, a reboot took care of that.
However, locks over nfs in buster either behave differently or there is some other flag someplace I missed.
I've pushed over changes to the adds-changes scripts to skip locking for now, since only oe process runs at a time for a given date anyways. However, it needs to be fixed soon. I need also to see if the locking mechanism for xml/sql dumps works in buster as is, since that switchover is coming up very soon.
Changeset for skipping locks not yet merged, that will go tomorrow.
Backrunning the Nov 23 adds-changes now so they'll be complete in time for the Nov 24 run which kicks off around 9 pm UTC.
Change 552658 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] add ability to skip locking for adds-changes dumps
Change 552658 merged by ArielGlenn:
[operations/dumps@master] add ability to skip locking for adds-changes dumps
Change 552659 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] configure adds-changes dumps to skip locking for now
Change 552659 merged by ArielGlenn:
[operations/puppet@production] configure adds-changes dumps to skip locking for now
I have tested on snapshot1008, which mounts only the buster nfs share, that the dump_lock.py script with multiple instances works as it should; this is the locking mechanism for xml/sql dumps. This means that although the adds-changes dumps locking must still be investigated later, I can go ahead and re-image dumpsdata1001 now that the current xml/sql run has completed.
Change 553324 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] dumpsdata1001 will install with buster now
Change 553324 merged by ArielGlenn:
[operations/puppet@production] dumpsdata1001 will install with buster now
Aaaaand dumpsdata1001 is reimaged. All the data is still there, available to snapshot hosts.