Page MenuHomePhabricator

Decommission db1051-db1060 (DBA tracking)
Closed, ResolvedPublic

Description

As part of the regular refresh, and given the aging database hosts, plan the purchase, setup of new databases and the removal of older ones.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1069.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201804301349_jynus_6466.log.

Completed auto-reimage of hosts:

['db1069.eqiad.wmnet']

and were ALL successful.

Change 429800 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Move db1069 from s7 to x1 (while still fully depooled)

https://gerrit.wikimedia.org/r/429800

Change 429805 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1056

https://gerrit.wikimedia.org/r/429805

Change 429812 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Move db1069 from s7 to x1

https://gerrit.wikimedia.org/r/429812

Change 429812 merged by Jcrespo:
[operations/software@master] mariadb: Move db1069 from s7 to x1

https://gerrit.wikimedia.org/r/429812

Change 429824 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1056 and db1069 with low load after maintenance

https://gerrit.wikimedia.org/r/429824

Change 429824 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1056 and db1069 with low load after maintenance

https://gerrit.wikimedia.org/r/429824

Change 429826 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully pool back db1056 and db1069 as x1 replicas

https://gerrit.wikimedia.org/r/429826

Change 429826 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully pool back db1056 and db1069 as x1 replicas

https://gerrit.wikimedia.org/r/429826

Change 430377 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool new vslow,dump host on s4 (db1121), move db1064 to x1

https://gerrit.wikimedia.org/r/430377

Change 430377 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Pool new vslow,dump host on s4 (db1121), move db1064 to x1

https://gerrit.wikimedia.org/r/430377

jcrespo removed a subtask: Unknown Object (Task).May 4 2018, 11:54 AM

Change 431566 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Promote db1069 to be x1 master

https://gerrit.wikimedia.org/r/431566

Change 431567 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] x1.hosts: db1069 is the new x1 master

https://gerrit.wikimedia.org/r/431567

Change 431568 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: db1069 is now x1 master

https://gerrit.wikimedia.org/r/431568

Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:28:22Z] <marostegui> Disable gtid on db1069 an db2034 before x1 failover - T186320

Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:29:48Z] <marostegui> Disable puppet on db1055 and db1069 before x1 failover - T186320

Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:36:35Z] <marostegui> Move dbstore1002:x1 under db1069 for x1 failover - T186320

Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:41:22Z] <marostegui> Move db2034 under db1069 for x1 failover - T186320

Change 431568 merged by Marostegui:
[operations/puppet@production] mariadb: db1069 is now x1 master

https://gerrit.wikimedia.org/r/431568

Change 431566 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Promote db1069 to be x1 master

https://gerrit.wikimedia.org/r/431566

Change 431567 merged by jenkins-bot:
[operations/software@master] x1.hosts: db1069 is the new x1 master

https://gerrit.wikimedia.org/r/431567

Change 432358 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Prepare db1072 for stretch reimage and move it to m3

https://gerrit.wikimedia.org/r/432358

Change 432358 merged by Jcrespo:
[operations/puppet@production] mariadb: Prepare db1072 for stretch reimage and move it to m3

https://gerrit.wikimedia.org/r/432358

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1072.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201805101539_jynus_4853.log.

Completed auto-reimage of hosts:

['db1072.eqiad.wmnet']

and were ALL successful.

db1072 is loading m3 from the last logical backup.

Change 432409 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072

https://gerrit.wikimedia.org/r/432409

Change 432409 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072

https://gerrit.wikimedia.org/r/432409

Change 432552 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1066 from s1 to an s2 master candidate

https://gerrit.wikimedia.org/r/432552

Change 432554 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbhosts: Move db1066 from s1 to s2

https://gerrit.wikimedia.org/r/432554

Change 432554 merged by Jcrespo:
[operations/software@master] dbhosts: Move db1066 from s1 to s2

https://gerrit.wikimedia.org/r/432554

Change 432552 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1066 from s1 to an s2 master candidate

https://gerrit.wikimedia.org/r/432552

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1066.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201805110916_jynus_29532.log.

Change 432560 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1076 for maintenance

https://gerrit.wikimedia.org/r/432560

Change 432560 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1076 for maintenance

https://gerrit.wikimedia.org/r/432560

Change 432565 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Productionize db1066 after reimage

https://gerrit.wikimedia.org/r/432565

Change 432566 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1066 with low load after reimage

https://gerrit.wikimedia.org/r/432566

Change 432565 merged by Jcrespo:
[operations/puppet@production] mariadb: Productionize db1066 after reimage

https://gerrit.wikimedia.org/r/432565

Change 432566 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1066 with low load after reimage

https://gerrit.wikimedia.org/r/432566

Change 432575 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Allow reimage of db107* hosts

https://gerrit.wikimedia.org/r/432575

Change 432575 merged by Jcrespo:
[operations/puppet@production] mariadb: Allow reimage of db107* hosts

https://gerrit.wikimedia.org/r/432575

Change 432577 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Revert db recipe for all databases

https://gerrit.wikimedia.org/r/432577

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1076.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201805111305_jynus_8054.log.

Completed auto-reimage of hosts:

['db1076.eqiad.wmnet']

and were ALL successful.

Change 432577 merged by Jcrespo:
[operations/puppet@production] install_server: Revert db recipe for all databases

https://gerrit.wikimedia.org/r/432577

Change 432581 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1076 with low load, increase db1066 load

https://gerrit.wikimedia.org/r/432581

Change 432581 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1076 with low load, increase db1066 load

https://gerrit.wikimedia.org/r/432581

Change 432602 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1076 back with full weight

https://gerrit.wikimedia.org/r/432602

Change 432602 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1076 back with full weight

https://gerrit.wikimedia.org/r/432602

Change 434920 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Remove old references to db105* hosts at dns

https://gerrit.wikimedia.org/r/434920

Change 434920 merged by Jcrespo:
[operations/dns@master] mariadb: Remove old references to db105* and codfw hosts at dns

https://gerrit.wikimedia.org/r/434920

Change 437703 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Failover m2-master to db1065

https://gerrit.wikimedia.org/r/437703

Change 437707 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Failover m3-master to db1072

https://gerrit.wikimedia.org/r/437707

Change 437710 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Update misc replica CNAME for m2 and m3

https://gerrit.wikimedia.org/r/437710

Change 437714 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m2-master to db1065

https://gerrit.wikimedia.org/r/437714

Change 437715 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m3-master to db1072

https://gerrit.wikimedia.org/r/437715

Change 437703 merged by Jcrespo:
[operations/puppet@production] mariadb: Failover m2-master to db1065

https://gerrit.wikimedia.org/r/437703

Change 437714 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m2-master to db1065

https://gerrit.wikimedia.org/r/437714

Change 437714 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover m2-master to db1065

https://gerrit.wikimedia.org/r/437714

Change 437769 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Remove db1051, to be decommissioned, add db1065

https://gerrit.wikimedia.org/r/437769

Change 437769 merged by Jcrespo:
[operations/software@master] mariadb: Remove db1051, to be decommissioned, add db1065

https://gerrit.wikimedia.org/r/437769

Change 437710 merged by Jcrespo:
[operations/dns@master] mariadb: Update misc replica CNAME for m2 and m3

https://gerrit.wikimedia.org/r/437710

Change 437707 merged by Marostegui:
[operations/puppet@production] mariadb: Failover m3-master to db1072

https://gerrit.wikimedia.org/r/437707

Change 437715 merged by Marostegui:
[operations/puppet@production] mariadb: Switchover m3-master to db1072

https://gerrit.wikimedia.org/r/437715

Marostegui added a comment.EditedJun 8 2018, 3:45 PM

To sum up the pending active hosts:

db1054 (s2 primary master): Failover scheduled T194870
db1052 (s1 primary master): Failover scheduled T197069

db1054 is pending wait everything is working as expected on s2- db1052 process has not started yet. All others are ready for robh/dcops to continue as noted on the subtasks.

db1054 is now handed over to DCOps for decommissioning.
The only pending host is db1052 (s1 primary master) which is scheduled to be failed over the 18th July (T197069)

Marostegui moved this task from In progress to Done on the DBA board.Aug 1 2018, 7:24 AM

This is essentially all done - all the hosts have been handed over to DCOps for the last steps of the decommissioning process.

Cmjohnson closed subtask T194634: Decommission db1053 as Resolved.
Cmjohnson closed subtask T197063: Decommission db1054 as Resolved.
Cmjohnson closed subtask T193736: Decommission db1056 as Resolved.
Cmjohnson closed subtask T196606: Decommission db1059 as Resolved.
Cmjohnson closed subtask T193732: Decommission db1060 as Resolved.
Marostegui closed this task as Resolved.Aug 21 2018, 5:37 PM
Marostegui added subscribers: Cmjohnson, RobH.

All these hosts have now been fully decommissioned
Thanks @RobH a @Cmjohnson for all the help!