Page MenuHomePhabricator

Provide dedicated database resources for wikidata
Closed, ResolvedPublic

Description

  • Create new functional group s8, dedicated to wikidata
    • Prepare mediawiki and cloud wiki replica configuration for s8
    • Provision new hardware, serving exclusively the wikidata dataset
  • Refactor puppet/core db to introduce multi-instance for mediawiki hosts (this will be a benefit for the whole db infra)
  • Switchover service to the new functional group s8 (master failover)

Only small follow-ups pending: cloud and dbstore replication and other small things.

  • dbstore1001 done, but s8 got corrupted, needs reload
  • dbstore1002
  • dbstore2001 done, respective drops pending
  • db1095 (sanitarium)

Details

SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+420 -659
operations/softwaremaster+1 -0
operations/puppetproduction+3 -1
operations/puppetproduction+3 -2
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+5 -5
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+32 -11
operations/mediawiki-configmaster+32 -11
operations/mediawiki-configmaster+19 -19
operations/softwaremaster+8 -8
operations/mediawiki-configmaster+65 -26
operations/dnsmaster+2 -1
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+2 -0
operations/mediawiki-configmaster+24 -24
operations/mediawiki-configmaster+23 -22
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+14 -12
operations/puppetproduction+37 -19
operations/puppetproduction+1 -1
operations/puppetproduction+50 -25
operations/softwaremaster+9 -9
operations/mediawiki-configmaster+9 -14
operations/puppetproduction+2 -3
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+110 -75
operations/mediawiki-configmaster+8 -4
operations/softwaremaster+1 -1
operations/puppetproduction+4 -5
operations/mediawiki-configmaster+2 -7
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+7 -1
operations/puppetproduction+1 -1
operations/mediawiki-configmaster+4 -4
operations/mediawiki-configmaster+6 -6
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 393012 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: db1063, db1051 to fully serve dump

https://gerrit.wikimedia.org/r/393012

Mentioned in SAL (#wikimedia-operations) [2017-11-23T06:32:06Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Fully pool db1051 and db1063 in vslow service for s5 to warm them up for the s8 split - T177208 (duration: 00m 46s)

Change 392881 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover s5 codfw master (db2023) to db2052

https://gerrit.wikimedia.org/r/392881

Change 392879 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Switchover codfw s5 master from db2023 to db2052

https://gerrit.wikimedia.org/r/392879

Change 393035 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbhosts: Update s5 hosts and introduce s8 hosts

https://gerrit.wikimedia.org/r/393035

Change 393035 merged by Jcrespo:
[operations/software@master] dbhosts: Update s5 hosts and introduce s8 hosts

https://gerrit.wikimedia.org/r/393035

Change 393065 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] [WIP]mariadb: Move hosts to s8 replica set on codfw

https://gerrit.wikimedia.org/r/393065

Change 393065 merged by Jcrespo:
[operations/puppet@production] mariadb: Move hosts to s8 replica set on codfw

https://gerrit.wikimedia.org/r/393065

Change 393086 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1071 to s8

https://gerrit.wikimedia.org/r/393086

Change 393086 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1071 to s8

https://gerrit.wikimedia.org/r/393086

Change 393094 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus-mysqld-exporter: Introduce s8 replica set on prometheus

https://gerrit.wikimedia.org/r/393094

Change 393094 merged by Jcrespo:
[operations/puppet@production] prometheus-mysqld-exporter: Introduce s8 replica set on prometheus

https://gerrit.wikimedia.org/r/393094

Change 393102 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move some (only the single-instance) s5 hosts to s8

https://gerrit.wikimedia.org/r/393102

Change 393102 merged by Jcrespo:
[operations/puppet@production] mariadb: Move some (only the single-instance) s5 hosts to s8

https://gerrit.wikimedia.org/r/393102

Change 393185 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1101:3318

https://gerrit.wikimedia.org/r/393185

Change 393185 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1101:3318

https://gerrit.wikimedia.org/r/393185

Mentioned in SAL (#wikimedia-operations) [2017-11-24T08:48:03Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101:3318 in s5 to warm it up and depool db1092 - T178359 T177208 (duration: 00m 45s)

Change 393203 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: move db2085:s5 to db2085:s8

https://gerrit.wikimedia.org/r/393203

Change 393204 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool all future s8 hosts

https://gerrit.wikimedia.org/r/393204

Change 393205 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db2086:s5 to db2086:s8

https://gerrit.wikimedia.org/r/393205

Change 393204 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool all future s8 hosts

https://gerrit.wikimedia.org/r/393204

Mentioned in SAL (#wikimedia-operations) [2017-11-24T10:33:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool all future s8 slaves for a topology change - T177208 (duration: 00m 45s)

Change 393203 merged by Jcrespo:
[operations/puppet@production] mariadb: move db2085:s5 to db2085:s8

https://gerrit.wikimedia.org/r/393203

Change 393205 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2086:s5 to db2086:s8

https://gerrit.wikimedia.org/r/393205

Change 393208 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] [WIP]mariadb: Change db208[56]:3315 to port 3318; repool db2038

https://gerrit.wikimedia.org/r/393208

Change 393208 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Change db208[56]:3315 to port 3318; repool db2038

https://gerrit.wikimedia.org/r/393208

Change 393569 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1082 and db1087 to ROW for labsdb filtering

https://gerrit.wikimedia.org/r/393569

Change 393569 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1082 and db1087 to ROW for labsdb filtering

https://gerrit.wikimedia.org/r/393569

Change 393588 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Warm up s8 hosts

https://gerrit.wikimedia.org/r/393588

Change 393588 abandoned by Marostegui:
db-eqiad.php: Warm up s8 hosts

Reason:
this is not needed as we are not doing the failover soon

https://gerrit.wikimedia.org/r/393588

Change 393723 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1082 for maintenance

https://gerrit.wikimedia.org/r/393723

Change 393723 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1082 for maintenance

https://gerrit.wikimedia.org/r/393723

Change 393755 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Update s5-master and add s8-master CNAMEs

https://gerrit.wikimedia.org/r/393755

Change 393755 merged by Jcrespo:
[operations/dns@master] mariadb: Update s5-master and add s8-master CNAMEs

https://gerrit.wikimedia.org/r/393755

Change 393808 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Setup s8 empty on eqiad

https://gerrit.wikimedia.org/r/393808

Change 393808 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Setup s8 empty on eqiad

https://gerrit.wikimedia.org/r/393808

Change 394057 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s8.hosts: Create file with the s8 hosts

https://gerrit.wikimedia.org/r/394057

Change 394057 merged by jenkins-bot:
[operations/software@master] s8.hosts: Add eqiad hosts

https://gerrit.wikimedia.org/r/394057

Mentioned in SAL (#wikimedia-operations) [2017-11-30T08:21:53Z] <marostegui> Enable GTID on s8 eqiad hosts that do not have it enabled (db1109, db1104, db1101, db1092, db1087, db1063) - T177208

Change 401433 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1070.yaml: Update new socket location

https://gerrit.wikimedia.org/r/401433

Change 401434 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Set s5 on read_only

https://gerrit.wikimedia.org/r/401434

Change 401436 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Point wikidatawiki to s8

https://gerrit.wikimedia.org/r/401436

Change 402801 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Warm up future s8 hosts

https://gerrit.wikimedia.org/r/402801

Change 402801 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Warm up future s8 hosts

https://gerrit.wikimedia.org/r/402801

Mentioned in SAL (#wikimedia-operations) [2018-01-08T12:27:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Warm up s8 future hosts - T177208 (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2018-01-08T12:34:52Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: revert warm up s8 future hosts - T177208 (duration: 02m 58s)

Change 402817 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Warm up s8 hosts

https://gerrit.wikimedia.org/r/402817

Change 402817 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Warm up s8 hosts

https://gerrit.wikimedia.org/r/402817

Mentioned in SAL (#wikimedia-operations) [2018-01-08T12:44:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Warm up s8 future hosts - T177208 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-01-08T12:49:41Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Warm up s8 future hosts - T177208 (duration: 00m 27s)

Change 401433 merged by Marostegui:
[operations/puppet@production] db1070.yaml: Update new socket location

https://gerrit.wikimedia.org/r/401433

Change 401434 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Set s5 on read_only

https://gerrit.wikimedia.org/r/401434

Change 403104 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Set s5, s8 read only OFF

https://gerrit.wikimedia.org/r/403104

Mentioned in SAL (#wikimedia-operations) [2018-01-09T06:01:48Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Set s5 on read-only to start failover T177208 T181645 (duration: 00m 50s)

Change 401436 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Point wikidatawiki to s8

https://gerrit.wikimedia.org/r/401436

Mentioned in SAL (#wikimedia-operations) [2018-01-09T06:11:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Splitting s5 and s8 T177208 T181645 (duration: 00m 50s)

Change 403104 merged by Marostegui:
[operations/mediawiki-config@master] db-eqiad.php: Set s5, s8 read only OFF

https://gerrit.wikimedia.org/r/403104

Mentioned in SAL (#wikimedia-operations) [2018-01-09T06:14:46Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove read_only from s5 and s8 T177208 T181645 (duration: 00m 27s)

Marostegui added subscribers: aude, mark, Joe.

Failover is done
Read only started: 06:01
Read only finished: 06:14

Thanks @Ladsgroup @aude @mark @Joe for being online and supporting the DBAs!

There are still some pending tasks:

Configure sanitarium to replicate the new channel
Configure dbstores to replicate the new channel
Drop dewiki tables from s8 master (db1071)
Drop wikidata tables from s5 master (db1070)

This comment was removed by Marostegui.
This comment was removed by Marostegui.

Change 403110 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Setup s8 on dbstore2001

https://gerrit.wikimedia.org/r/403110

Change 403110 merged by Jcrespo:
[operations/puppet@production] mariadb: Setup s8 on dbstore2001

https://gerrit.wikimedia.org/r/403110

Change 403119 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add s8 to dbstore config and monitoring

https://gerrit.wikimedia.org/r/403119

Change 403119 merged by Marostegui:
[operations/puppet@production] mariadb: Add s8 to dbstore config and monitoring

https://gerrit.wikimedia.org/r/403119

So I contaminated by mistake s5/s8 on dbstore1001 due to timing of delayed replication. As a consolation, I do not think they were in such a good shape anyway. I will try to revert the changes, but cannot assure I can save it.

From the checksums I did past weeks it was indeed pretty inconsistent in all the shards so I don't think it is a big deal to have a added a bit more drifts.
It needs a full reimport anyways.

I've set up Replicate_Wild_Do_Table: dewiki.%,heartbeat.% on s5 to unbreak it (it will not be affected), but s8 will still not be in a good state. Time to accelerate T159430 or codfw backups for s8.

I believe we are good to close this task after Bryan finished with the pending Cloud Team's tasks?

jcrespo claimed this task.

yes, but let's open one for followup/clean up - delete, which we will want to wait to do (leave data there for a few weeks), reassign servers, etc.

Change 404406 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s8.hosts: Add dbstore2001

https://gerrit.wikimedia.org/r/404406

Change 404406 merged by jenkins-bot:
[operations/software@master] s8.hosts: Add dbstore2001

https://gerrit.wikimedia.org/r/404406

Change 391198 abandoned by Jcrespo:
WIP: setup s8 and db-common.php refactoring

Reason:
We will need to change the style of db-*.php config, but because some parts will be migrated to etcd, we will have to think about that in the future.

https://gerrit.wikimedia.org/r/391198