Page MenuHomePhabricator

Upgrade snapshot hosts to Buster
Closed, ResolvedPublic

Description

Dumps (snapshot*) hosts should be migrated to buster around the same time we upgrade our other mediawiki clusters.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
Resolved toan
Resolved Lucas_Werkmeister_WMDE
ResolvedJoe
ResolvedJdforrester-WMF
ResolvedLadsgroup
InvalidNone
ResolvedReedy
OpenNone
Resolvedtstarling
ResolvedJdforrester-WMF
StalledNone
ResolvedNone
ResolvedPRODUCTION ERRORLegoktm
Resolvedtstarling
ResolvedJoe
ResolvedKrinkle
Resolvedhashar
ResolvedJdforrester-WMF
ResolvedDzahn
ResolvedArielGlenn

Event Timeline

jbond triaged this task as Medium priority.Dec 9 2020, 12:08 PM

I can do the testbed host first, and then the rest. Do we have a mediawiki server on buster anywhere in the cluster yet?

Yes, mwdebug1003 is running Buster, you can select it with the latest version of the WikimediaDebug browser extension.

Preliminaries:

  • build mwbzutils package for buster and make sure it passes all tests

I've built th package and set up a test instance in deployment-prep, but there's issues with mediawiki scripts there; see T273089 for the details.

Change 659886 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] snapshot1007 (testbed host) install with buster

https://gerrit.wikimedia.org/r/659886

Tests of xml/sql dumps in buster instance in deployment-prep look good. Next step: reimage the snapshot testbed instance in production.

Change 659886 merged by ArielGlenn:
[operations/puppet@production] snapshot1007 (testbed host) install with buster

https://gerrit.wikimedia.org/r/659886

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1007.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202101291442_ariel_5091_snapshot1007_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['snapshot1007.eqiad.wmnet']

and were ALL successful.

Change 659957 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] dumps: add a config for xml/sql dumps that writes elsewhere than prod dirs

https://gerrit.wikimedia.org/r/659957

Change 659957 merged by ArielGlenn:
[operations/puppet@production] dumps: add a config for xml/sql dumps that writes elsewhere than prod dirs

https://gerrit.wikimedia.org/r/659957

Test run of elwikiquote on reimaged testbed server running buster looks good, but I should do a prefetch run tomorrow morning just to be extra sure. Then I'll be able to switch the testbed with an xml/sql dump runner for regular wikis, in time for the Feb 1 run, and see how it goes.

Change 660634 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make snapshot1007 running buster a dumpsrunner and move testbed to 1005

https://gerrit.wikimedia.org/r/660634

Change 660634 merged by ArielGlenn:
[operations/puppet@production] make snapshot1007 running buster a dumpsrunner and move testbed to 1005

https://gerrit.wikimedia.org/r/660634

The prefetch runs went well. I ran a small wiki on snapshot1007 (buster) and then on snapshot1005 (stretch) on the same hardware. The times were slightly faster on buster.

Assuming that all goes well with the production run on buster, which we should know in 6 or 7 days, I'll be able to convert snapshot1010 next.

Change 660779 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] add a proper media section to the deployment-prep dumps config file

https://gerrit.wikimedia.org/r/660779

Change 660779 merged by ArielGlenn:
[operations/puppet@production] add a proper media section to the deployment-prep dumps config file

https://gerrit.wikimedia.org/r/660779

Change 660781 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] Make media lists dump easily runnable in deployment-prep

https://gerrit.wikimedia.org/r/660781

Change 660781 merged by ArielGlenn:
[operations/puppet@production] Make media lists dump easily runnable in deployment-prep

https://gerrit.wikimedia.org/r/660781

I have tested in deployment-prep all of the "other" dumps (not xml/sql) except for the wikidata and adds-changes dumps. Those are next.

Change 660819 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make adds-changes dumps easier to test in deployment-prep

https://gerrit.wikimedia.org/r/660819

Change 660819 merged by ArielGlenn:
[operations/puppet@production] make adds-changes dumps easier to test in deployment-prep

https://gerrit.wikimedia.org/r/660819

Change 660871 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make wikidata rdf dumps easier to test in deployment-prep

https://gerrit.wikimedia.org/r/660871

Change 661170 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] refactor script for wikidata and commons rdf dumps

https://gerrit.wikimedia.org/r/661170

Change 661642 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] prep for re-install of snapshot1009, 1010 with buster

https://gerrit.wikimedia.org/r/661642

Change 661642 merged by ArielGlenn:
[operations/puppet@production] prep for re-install of snapshot1009, 1010 with buster

https://gerrit.wikimedia.org/r/661642

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102040754_ariel_21056_snapshot1009_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['snapshot1009.eqiad.wmnet']

and were ALL successful.

snapshot1009 was idle so I converted it. snapshot1010 should become idle in an hour or two, so I'll be able to do that later today. I might not do anything about snapshot1005,6 since they are due to be replaced and the replacements should be here any day now. Thy can simply be installed with buster from the start and the old servers decommissioned.

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102041248_ariel_6167_snapshot1010_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['snapshot1010.eqiad.wmnet']

and were ALL successful.

snapshot1010 is done. I need to do a bunch more testing before I can reimage snapshot1008.

Change 662756 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@master] in deployment-prep some groups don't exist, permit scripts that use them to run

https://gerrit.wikimedia.org/r/662756

Change 662756 merged by jenkins-bot:
[mediawiki/core@master] in deployment-prep some groups don't exist, permit scripts that use them to run

https://gerrit.wikimedia.org/r/662756

Change 661170 merged by ArielGlenn:
[operations/puppet@production] refactor script for wikidata and commons rdf dumps

https://gerrit.wikimedia.org/r/661170

While it would be nice to continue to make the wikidata entity dumps more easy to run in deployment-prep, it can wait a bit while I move to testing the wikidata json dumps, next needed for the move to buster.

Change 663661 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] refactor wikidata json dumps to be easier to test on deployment-prep

https://gerrit.wikimedia.org/r/663661

Change 663661 merged by ArielGlenn:
[operations/puppet@production] refactor wikidata json dumps to be easier to test on deployment-prep

https://gerrit.wikimedia.org/r/663661

Change 664091 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner

https://gerrit.wikimedia.org/r/664091

Change 664091 merged by ArielGlenn:
[operations/puppet@production] now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner

https://gerrit.wikimedia.org/r/664091

Change 664092 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] prep snapshot1005 and 1006 for reinstall with buster

https://gerrit.wikimedia.org/r/664092

Change 664092 merged by ArielGlenn:
[operations/puppet@production] prep snapshot1005 and 1006 for reinstall with buster

https://gerrit.wikimedia.org/r/664092

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102150817_ariel_8905_snapshot1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['snapshot1005.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1006.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102150912_ariel_1945_snapshot1006_eqiad_wmnet.log.

Change 664225 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] misc dumps: move commons rdf to later on Sunday and media info to earlier

https://gerrit.wikimedia.org/r/664225

Completed auto-reimage of hosts:

['snapshot1006.eqiad.wmnet']

and were ALL successful.

I was not going to re-image snapshot1005 and 6 because their replacements were due to have come in, but the boxes have not arrived yet and we still do not have an eta. So they are done now.

The last server remaining is snapshot1008. All "misc" dumps have been tested on beta in deployment-prep, and so the re-imaging of this host can happen next Sunday. I am rearranging the Sunday cron jobs a little so that we have a longer maintenance window going forward, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/664225

Change 664225 merged by ArielGlenn:
[operations/puppet@production] misc dumps: move commons rdf to later on Sunday and media info to earlier

https://gerrit.wikimedia.org/r/664225

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1008.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102210918_ariel_28889_snapshot1008_eqiad_wmnet.log.

Change 665583 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] snapshot1008 to install from buster image

https://gerrit.wikimedia.org/r/665583

Change 665583 merged by ArielGlenn:
[operations/puppet@production] snapshot1008 to install from buster image

https://gerrit.wikimedia.org/r/665583

Completed auto-reimage of hosts:

['snapshot1008.eqiad.wmnet']

and were ALL successful.

So the reimage completed but still on stretch. I've updated the install file and here we go again.

Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts:

snapshot1008.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102210952_ariel_3141_snapshot1008_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['snapshot1008.eqiad.wmnet']

and were ALL successful.

ArielGlenn claimed this task.

These are done now. Closing the ticket.