Page MenuHomePhabricator

Move labswiki database to m5
Closed, ResolvedPublic

Description

In the long term we would like to move wikitech's database to s3 or another core MediaWiki cluster. That move has some complications however that will require DBA assistance. As an intermediate step to unblock moving wikitech to the new labsweb* servers and decommissioning silver we would like to move the database to the m5 misc cluster.

  • Create a new labswiki schema on m5-master.eqiad.wmnet
  • Create wikiadmin and wikiuser accounts
  • Grant needed rights to wikiadmin and wikiuser accounts
  • Make wikitech read-only
  • Make dump of labswiki from silver
  • Import dump of labswiki from silver to m5-master
  • Update wmf-config/db-eqiad.php config to point wikitech at m5
  • Test wikitech read-only vs m5
  • Make wikitech read-write
  • Profit!

Related Objects

Event Timeline

To minimize the number of things changing at once, this switch should be done before migrating wikitech from silver to labsweb*.

Change 413884 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wikitech: grants for the new labswiki db on m5

https://gerrit.wikimedia.org/r/413884

Mentioned in SAL (#wikimedia-operations) [2018-02-26T15:41:24Z] <andrewbogott> marking wikitech read-only (via a local edit to CommonSettings.php) for https://phabricator.wikimedia.org/T188029

Mentioned in SAL (#wikimedia-operations) [2018-02-26T15:42:04Z] <andrewbogott> marking wikitech read-only (via a local edit to CommonSettings.php) for https://phabricator.wikimedia.org/T188029

Change 414733 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] wikitech: use 'labswiki' database on m5-master

https://gerrit.wikimedia.org/r/414733

Here is what I just did:

  1. Created 'labswiki' on m5-master
  2. Granted access as per https://gerrit.wikimedia.org/r/#/c/413884/
  3. Marked wikitech read-only by adding $wgReadOnly to /srv/mediawiki/wmf-config/CommonSettings.php on Silver
  4. Dumped 'labswiki' database via # mysqldump labswiki > /a/andrewwork/labswikidump.sql
  5. Copied labswikidump.sql to m5-master
  6. Imported via # mysql labswiki < labswikidump.sql
  7. Applied changes to switch wikitech to the new database, as per https://gerrit.wikimedia.org/r/#/c/414733/
  8. Stopped mariadb on silver

That looked pretty good, for reading. Somewhere back in steps 5-6 CommonSettings.php was reverted (by the train, maybe?) so the new wikitech was read/write. Bryan tried to edit the SAL for a project and the page was suddenly replaced with a totally different SAL -- a second edit produced a similar bad result (a different page being inserted in the page edited.)

So, reverted step 7 and restarted mariadb on Silver, reverting us back to the state we had after step 2.

Pro tip: setting mediawiki on read only actually doesn't set it as read only. Go and complain the extension support not me. MySQL on silver needs to be in read only mode to guarantee no writes go through. For m5, as we cannot set it as read only without affecting the other databases, you basically want to remove all grants (update, create, insert, delete, alter) that are not select, etc.

Thanks @jcrespo! This probably went awry because of a combination of that and scap messing with my step 7. So I'm going to try again, with additional steps:

2.5. 'scap lock --all' on silver
2.5.7. make mysql on silver read-only
8.5 'scap unlock' (actually just hitting ctrl-c on the terminal holding the lock)

I don't think we need the new db to be read-only since once wikitech is pointed there we're done...

Mentioned in SAL (#wikimedia-operations) [2018-02-26T22:07:32Z] <andrewbogott> made mysql on silver read-only, hopefully for good. T188029

Change 414733 merged by jenkins-bot:
[operations/mediawiki-config@master] wikitech: use 'labswiki' database on m5-master

https://gerrit.wikimedia.org/r/414733

Mentioned in SAL (#wikimedia-operations) [2018-02-26T23:34:30Z] <bd808@tin> Started scap: wikitech: use 'labswiki' database on m5-master (T188029)

Mentioned in SAL (#wikimedia-operations) [2018-02-26T23:37:51Z] <bd808@tin> Finished scap: wikitech: use 'labswiki' database on m5-master (T188029) (duration: 03m 21s)

Change 413884 merged by Andrew Bogott:
[operations/puppet@production] wikitech: grants for the new labswiki db on m5

https://gerrit.wikimedia.org/r/413884

This was noisy but is done now.

How are you doing wikitech-static? We are already generating backups of m5 including labswiki you probably don't want to do them on your own. Could a process like that have contributed to T188589?

Change 415995 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Remove mariadb from silver

https://gerrit.wikimedia.org/r/415995

Change 415995 merged by Andrew Bogott:
[operations/puppet@production] Remove mariadb from silver

https://gerrit.wikimedia.org/r/415995

How are you doing wikitech-static? We are already generating backups of m5 including labswiki you probably don't want to do them on your own. Could a process like that have contributed to T188589?

My tentative plan is to keep doing the wikitech-static syncs exactly as I'm doing them now (albeit from labweb1001). The dump used for the wikitech-static sync is only latest revisions of pages, so the process is quite fast and (unless I misunderstand) fairly different from the standard db backup process.

It's worth considering this as a possible cause of the bad behavior we're seeing, though. The dump happens every day at

30 1 * * *

The grafana graph above doesn't look especially bad at 01:30 UTC but it's worth keeping an eye out.

Change 416210 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wikitech: remove crons to backup the mediawiki database

https://gerrit.wikimedia.org/r/416210

Change 416210 merged by Andrew Bogott:
[operations/puppet@production] wikitech: remove crons to backup the mediawiki database

https://gerrit.wikimedia.org/r/416210

I would like to have a look at the script- logical backups are very difficult to be done fully online in a consistent way, that is why we take them always from non-masters, and encourage others to take dumps and statistics queries on replicas. Please coordinate with me next week.