Page MenuHomePhabricator

Do the big Quarry migration
Closed, ResolvedPublic

Description

Quarry will be subject to server maintenance on Wednesday, September 12 from 7pm UTC. The site will be read-only for a few hours, but should remain online.

Ordered to-do list:

Event Timeline

Framawiki triaged this task as High priority.EditedAug 22 2018, 10:14 PM

@zhuyifei1999 I purpose this ordered migration to-do list, feel free to edit it.
Normally we should have only a read-only period, no downtime :)

Backup sql db and resultset folder of legacy live main instance

I'm pretty sure the results live on NFS, so there isn't really a need to back this up (I think).

The current NFS implementation of resultsets is flawed. The runners have user 'quarry' with UID of 998, but main has it as 997. Currently it's not too bad, just that puppet keeps changing the ownership of /data/project/quarry/results/ back and forth, but if the two runners have different UIDs between them, it'll go really bad.

We should probably do T178520: Find somewhere else (not NFS) to store Quarry's resultsets after this task, if possible.

Change 454481 had a related patch set uploaded (by Dzahn; owner: Zhuyifei1999):
[operations/puppet@production] quarry::database: Use mariadb instead of mysql module

https://gerrit.wikimedia.org/r/454481

Change 458351 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[analytics/quarry/web@master] app.py: Load Redis for session from Connections

https://gerrit.wikimedia.org/r/458351

@Framawiki I think we can proceed with the migration (the test instance at https://quarry-dev.wmflabs.org/ shares NFS with the production instance so I'm too afraid to do any real testing there). When do you have time? I usually have time from 9PM to 3AM UTC (the next day) from Tuesday to Thursday, and 3PM to 3AM UTC (the next day) on weekends. Considering we have a few puppet patches, might be better to merge on a weekday.

Scheduled for 7pm UTC next Wednesday

I've set the following maintenance message:

Quarry will be subject to server maintenance on Wednesday, September 12 from 7pm UTC. The site will be read-only for a few hours, but should remain online.

Mentioned in SAL (#wikimedia-cloud) [2018-09-07T19:56:53Z] <framawiki> deployed 501695f to quarry-main-01 (T202588)

Change 458351 merged by jenkins-bot:
[analytics/quarry/web@master] app.py: Load Redis for session from Connections

https://gerrit.wikimedia.org/r/458351

Change 451698 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[operations/puppet@production] quarry: Move the install into a venv and upgrade to Python 3

https://gerrit.wikimedia.org/r/451698

Mentioned in SAL (#wikimedia-cloud) [2018-09-11T17:14:22Z] <zhuyifei1999_> disabling puppet on quarry-main-01, quarry-runner-0{1,2} T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-11T17:22:48Z] <zhuyifei1999_> doing another backup of main db: sudo mysqldump quarry | sudo tee /data/project/dump-$(date '+%Y-%m-%d').sql > /dev/null T202588

Change 454481 merged by Bstorm:
[operations/puppet@production] quarry::database: Use mariadb instead of mysql module

https://gerrit.wikimedia.org/r/454481

Change 451698 merged by Bstorm:
[operations/puppet@production] quarry: Move the install into a venv and upgrade to Python 3

https://gerrit.wikimedia.org/r/451698

Change 428140 had a related patch set uploaded (by Zhuyifei1999; owner: Framawiki):
[analytics/quarry/web@master] Update dependencies

https://gerrit.wikimedia.org/r/428140

Change 440007 had a related patch set uploaded (by Zhuyifei1999; owner: Framawiki):
[analytics/quarry/web@master] Port to Python3

https://gerrit.wikimedia.org/r/440007

Change 428140 merged by jenkins-bot:
[analytics/quarry/web@master] Update dependencies

https://gerrit.wikimedia.org/r/428140

Change 440007 merged by jenkins-bot:
[analytics/quarry/web@master] Port to Python3

https://gerrit.wikimedia.org/r/440007

For the record

New instances:
quarry-web-01.quarry.eqiad.wmflabs

quarry-db-01.quarry.eqiad.wmflabs

quarry-runner-01.quarry.eqiad.wmflabs

quarry-runner-02.quarry.eqiad.wmflabs

Old ones:
quarry-main-01.quarry.eqiad.wmflabs
role::labs::quarry::killer
role::labs::quarry::web
role::labs::quarry::redis
role::labs::lvm::srv
role::labs::quarry::database

quarry-worker-01.quarry.eqiad.wmflabs
role::labs::quarry::celeryrunner

quarry-worker-02.quarry.eqiad.wmflabs
role::labs::quarry::celeryrunner
role::labs::quarry::killer

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T19:45:34Z] <zhuyifei1999_> created new quarry database and user in quarry-db-01.quarry.eqiad.wmflabs T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T20:02:04Z] <zhuyifei1999_> stopped celery-quarry-worker on quarry-runner-0{1,2} T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T20:03:54Z] <zhuyifei1999_> set quarry-main-01 mariadb read-only T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T20:27:11Z] <zhuyifei1999_> backed up old db to /data/project/dump-2018-09-12.sql and restoring to new server T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T20:42:34Z] <zhuyifei1999_> unset read-only on new database T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T20:41] <framawiki> switched quarry.wmflabs.org proxy to new quarry-web-01.quarry.eqiad.wmflabs

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T21:13:27Z] <zhuyifei1999_> set read-only again on new database because new quarry's UID is 498 T202588

Mentioned in SAL (#wikimedia-cloud) [2018-09-12T21:15:40Z] <zhuyifei1999_> sudo chown quarry:quarry /data/project/quarry/ -Rv T202588

Change 460111 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[analytics/quarry/web@master] connections.py: Explicitly enable MULTI_STATEMENTS for replica

https://gerrit.wikimedia.org/r/460111

Change 460111 merged by jenkins-bot:
[analytics/quarry/web@master] connections.py: Explicitly enable MULTI_STATEMENTS for replica

https://gerrit.wikimedia.org/r/460111

Mentioned in SAL (#wikimedia-cloud) [2018-09-13T19:10:07Z] <framawiki> copy /var/log/nginx from legacy main-01 to /data/project/nginx-logs-legacy-20180913-framawiki for further analysis T202588 T197256

Mentioned in SAL (#wikimedia-cloud) [2018-09-13T19:19:09Z] <framawiki> deleted legacy instances quarry-main-01 and quarry-runner-0{1,2}, migration is over T202588

Framawiki claimed this task.

The migration is complete ! \o/

Change 462295 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[analytics/quarry/web@master] killer.py: fix import since py3

https://gerrit.wikimedia.org/r/462295

Change 462295 merged by jenkins-bot:
[analytics/quarry/web@master] killer.py: fix import since py3

https://gerrit.wikimedia.org/r/462295

Mentioned in SAL (#wikimedia-cloud) [2018-09-23T16:56:55Z] <zhuyifei1999_> deployed till e74f575 on -web-01, T202588 T205153