Page MenuHomePhabricator

Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1)
Closed, ResolvedPublic

Description

We should have a master and a proper master candidate for failover with STATEMENT based replication, not replicating ROW to sanitarium, and physically separated from the original master.

  • s1
    • master: db2048
    • candidate: db2055
  • s2
    • master: db2035
    • candidate: db2041
  • s3
    • master: db2043
    • candidate: db2057
  • s4
    • master: db2051
    • candidate: db2058
  • s5
    • master: db2052
    • candidate: db2038
  • s6
    • master: db2039
    • candidate: db2046
  • s7
    • master: db2040
    • candidate: db2047
  • s8
    • master: db2045
    • candidate: db2079
  • x1
    • master: db2034
    • candidate: db2069 (T191275#4099560) (this is the only available candidate, as db2033 has a faulty BBU, so it will need to be replaced at some point once we start the movements on codfw - T184888)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 423862 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2041 is now a candidate master

https://gerrit.wikimedia.org/r/423862

Marostegui updated the task description. (Show Details)Apr 4 2018, 9:23 AM

Mentioned in SAL (#wikimedia-operations) [2018-04-04T09:24:15Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: db2041 is now a candidate master for s2 - T191275 (duration: 01m 16s)

For s3: db2057 is the best candidate. Same HW as the current master and different row.

Change 423879 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2057.yaml: Change binlog format to STATEMENT

https://gerrit.wikimedia.org/r/423879

Change 423879 merged by Marostegui:
[operations/puppet@production] db2057.yaml: Change binlog format to STATEMENT

https://gerrit.wikimedia.org/r/423879

Change 423883 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2057

https://gerrit.wikimedia.org/r/423883

Change 423883 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2057

https://gerrit.wikimedia.org/r/423883

Change 423887 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2057 is now a candidate master

https://gerrit.wikimedia.org/r/423887

Change 423887 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2057 is now a candidate master

https://gerrit.wikimedia.org/r/423887

Marostegui updated the task description. (Show Details)Apr 4 2018, 11:46 AM

Mentioned in SAL (#wikimedia-operations) [2018-04-04T11:47:52Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: db2057 is now a candidate master for s3 - T191275 (duration: 01m 17s)

For s4 I suggest: db2058
Same hardware as the master and it is in a different row.

Change 424201 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2058.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424201

Change 424201 merged by Marostegui:
[operations/puppet@production] db2058.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424201

Mentioned in SAL (#wikimedia-operations) [2018-04-05T05:58:44Z] <marostegui> Restart MySQL on db2058 to change its binlog to STATEMENT - T191275

Change 424203 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2058 is now a candidate master

https://gerrit.wikimedia.org/r/424203

Change 424203 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2058 is now a candidate master

https://gerrit.wikimedia.org/r/424203

Marostegui updated the task description. (Show Details)Apr 5 2018, 6:07 AM

Mentioned in SAL (#wikimedia-operations) [2018-04-05T06:08:06Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: db2058 is now a candidate master for s4 - T191275 (duration: 01m 16s)

For s5: db2038
Same HW, different ROW and old master.

Change 424206 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2038.yaml: Change binlog format and shard

https://gerrit.wikimedia.org/r/424206

Change 424207 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2038

https://gerrit.wikimedia.org/r/424207

Change 424207 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2038

https://gerrit.wikimedia.org/r/424207

Change 424206 merged by Marostegui:
[operations/puppet@production] db2038.yaml: Change binlog format and shard

https://gerrit.wikimedia.org/r/424206

Change 424210 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Repool db2038

https://gerrit.wikimedia.org/r/424210

Change 424210 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Repool db2038

https://gerrit.wikimedia.org/r/424210

Marostegui updated the task description. (Show Details)Apr 5 2018, 6:58 AM

For s6: db2053
Same HW, different row

Change 424211 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2053.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424211

Change 424212 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2053 is candidate master for s6

https://gerrit.wikimedia.org/r/424212

Change 424211 merged by Marostegui:
[operations/puppet@production] db2053.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424211

Change 424212 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2053 is candidate master for s6

https://gerrit.wikimedia.org/r/424212

Marostegui updated the task description. (Show Details)Apr 5 2018, 7:20 AM

For s7: db2054
Same HW, different row

Change 424299 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2054.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424299

Change 424299 merged by Marostegui:
[operations/puppet@production] db2054.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424299

For s6, db2053 needs to be reverted as candidate master as with the master's move, they'd end up in the same row (T191193). I will look for another candidate

Change 424302 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2053 - no longer candidate master

https://gerrit.wikimedia.org/r/424302

Change 424302 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2053 - no longer candidate master

https://gerrit.wikimedia.org/r/424302

Marostegui updated the task description. (Show Details)Apr 5 2018, 3:34 PM

Change 424522 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2046.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424522

Change 424523 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2046 is s6 candidate master

https://gerrit.wikimedia.org/r/424523

Change 424522 merged by Marostegui:
[operations/puppet@production] db2046.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424522

Mentioned in SAL (#wikimedia-operations) [2018-04-06T05:59:53Z] <marostegui> Restart MySQL on db2046 to change its binlog format - T191275

Change 424523 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2046 is s6 candidate master

https://gerrit.wikimedia.org/r/424523

Marostegui updated the task description. (Show Details)Apr 6 2018, 6:05 AM

For s7: db2047
Same HW and it will be in a different row once db2040 is moved to A3 (T191193)

Change 424524 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2047

https://gerrit.wikimedia.org/r/424524

Change 424525 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2047.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424525

Change 424524 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2047

https://gerrit.wikimedia.org/r/424524

Change 424525 merged by Marostegui:
[operations/puppet@production] db2047.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424525

Change 424526 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2047 is now candidate master in s7

https://gerrit.wikimedia.org/r/424526

Change 424526 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2047 is now candidate master in s7

https://gerrit.wikimedia.org/r/424526

Marostegui updated the task description. (Show Details)Apr 6 2018, 7:26 AM

Change 424990 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2079.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424990

Change 424990 merged by Marostegui:
[operations/puppet@production] db2079.yaml: Change binlog format

https://gerrit.wikimedia.org/r/424990

Change 424991 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: db2079 is now candidate master

https://gerrit.wikimedia.org/r/424991

Change 424991 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: db2079 is now candidate master

https://gerrit.wikimedia.org/r/424991

Marostegui updated the task description. (Show Details)Apr 9 2018, 6:18 AM

Change 424994 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db2092 to s1

https://gerrit.wikimedia.org/r/424994

Change 424994 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2092 to s1

https://gerrit.wikimedia.org/r/424994

Change 425013 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s1.hosts: Add db2092 to s1

https://gerrit.wikimedia.org/r/425013

Change 425013 merged by jenkins-bot:
[operations/software@master] s1.hosts: Add db2092 to s1

https://gerrit.wikimedia.org/r/425013

Change 425064 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2069.yaml: Disable notifications

https://gerrit.wikimedia.org/r/425064

Change 425064 merged by Marostegui:
[operations/puppet@production] db2069.yaml: Disable notifications

https://gerrit.wikimedia.org/r/425064

Change 425216 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2069

https://gerrit.wikimedia.org/r/425216

Change 425216 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2069

https://gerrit.wikimedia.org/r/425216

Mentioned in SAL (#wikimedia-operations) [2018-04-10T05:48:05Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db2069 from config - T191275 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-04-10T05:49:12Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db2069 from config - T191275 (duration: 00m 58s)

Change 425468 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db2069 from s1 to x1

https://gerrit.wikimedia.org/r/425468

Change 425468 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2069 from s1 to x1

https://gerrit.wikimedia.org/r/425468

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db2069.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201804110541_marostegui_16406.log.

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db2069.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201804110551_marostegui_18046.log.

Change 425473 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db2069 to the config

https://gerrit.wikimedia.org/r/425473

Change 425473 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db2069 to the config

https://gerrit.wikimedia.org/r/425473

Mentioned in SAL (#wikimedia-operations) [2018-04-11T06:15:51Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add db2069 to the config as depooled x1 slave - T191275 (duration: 01m 01s)

Mentioned in SAL (#wikimedia-operations) [2018-04-11T06:17:02Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db2069 to the config as depooled x1 slave - T191275 (duration: 01m 03s)

Completed auto-reimage of hosts:

['db2069.codfw.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-04-11T06:20:31Z] <marostegui> Stop MySQL on db2033 to clone db2069 - T191275

Change 425476 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: notifications enable/disable db2069/2033

https://gerrit.wikimedia.org/r/425476

Change 425476 merged by Marostegui:
[operations/puppet@production] mariadb: notifications enable/disable db2069/2033

https://gerrit.wikimedia.org/r/425476

Mentioned in SAL (#wikimedia-operations) [2018-04-11T07:27:33Z] <marostegui> Stop MySQL on db2033 to copy its data away before reimaging - T191275

Change 425480 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow reimage db2033

https://gerrit.wikimedia.org/r/425480

Change 425480 merged by Marostegui:
[operations/puppet@production] install_server: Allow reimage db2033

https://gerrit.wikimedia.org/r/425480

Change 425482 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Repool db2069

https://gerrit.wikimedia.org/r/425482

Marostegui updated the task description. (Show Details)Apr 11 2018, 8:01 AM

Change 425482 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Repool db2069

https://gerrit.wikimedia.org/r/425482

Mentioned in SAL (#wikimedia-operations) [2018-04-11T08:03:28Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2069 as candidate master for x1 - T191275 (duration: 01m 03s)

Change 425487 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s1,x1.hosts: Move db2069 from s1 to x1

https://gerrit.wikimedia.org/r/425487

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

['db2033.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201804110827_marostegui_21501.log.

Change 425487 merged by jenkins-bot:
[operations/software@master] s1,x1.hosts: Move db2069 from s1 to x1

https://gerrit.wikimedia.org/r/425487

Completed auto-reimage of hosts:

['db2033.codfw.wmnet']

and were ALL successful.

Marostegui closed this task as Resolved.Apr 11 2018, 9:37 AM
Marostegui claimed this task.

This is all done.