As part of the regular refresh, and given the aging database hosts, plan the purchase, setup of new databases and the removal of older ones.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T186320 Decommission db1051-db1060 (DBA tracking) | |||
Declined | Marostegui | T186503 Rebuild user_newtalk on db1052 | |||
Resolved | Cmjohnson | T193732 Decommission db1060 | |||
Resolved | Cmjohnson | T193736 Decommission db1056 | |||
Resolved | Cmjohnson | T194118 Decommission db1055 | |||
Resolved | Cmjohnson | T194634 Decommission db1053 | |||
Resolved | Marostegui | T194870 Failover s2 primary master | |||
Resolved | Cmjohnson | T193847 Move db1066 to row A | |||
Declined | None | T194867 BBU issues on db1054 (s2 primary master) | |||
Resolved | Johan | T195487 Announce read-only time for wikis on s2 for 13th June 2018 | |||
Resolved | Cmjohnson | T195484 Decommission db1051 | |||
Resolved | Cmjohnson | T196606 Decommission db1059 | |||
Resolved | Cmjohnson | T197063 Decommission db1054 | |||
Resolved | Marostegui | T197069 Failover db1052 (s1) db primary master | |||
Resolved | Johan | T197134 Announce 30 minutes read-only time for enwiki 18th July 06:00AM UTC | |||
Resolved | jcrespo | T199224 Test database master switchover script on codfw | |||
Resolved | RobH | T199861 Decommission db1052 |
Event Timeline
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db1069.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201804301349_jynus_6466.log.
Change 429800 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Move db1069 from s7 to x1 (while still fully depooled)
Change 429805 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1056
Change 429812 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Move db1069 from s7 to x1
Change 429812 merged by Jcrespo:
[operations/software@master] mariadb: Move db1069 from s7 to x1
Change 429824 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1056 and db1069 with low load after maintenance
Change 429824 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1056 and db1069 with low load after maintenance
Change 429826 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully pool back db1056 and db1069 as x1 replicas
Change 429826 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully pool back db1056 and db1069 as x1 replicas
Change 430377 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool new vslow,dump host on s4 (db1121), move db1064 to x1
Change 430377 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Pool new vslow,dump host on s4 (db1121), move db1064 to x1
Change 431566 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Promote db1069 to be x1 master
Change 431567 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] x1.hosts: db1069 is the new x1 master
Change 431568 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: db1069 is now x1 master
Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:28:22Z] <marostegui> Disable gtid on db1069 an db2034 before x1 failover - T186320
Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:29:48Z] <marostegui> Disable puppet on db1055 and db1069 before x1 failover - T186320
Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:36:35Z] <marostegui> Move dbstore1002:x1 under db1069 for x1 failover - T186320
Mentioned in SAL (#wikimedia-operations) [2018-05-08T05:41:22Z] <marostegui> Move db2034 under db1069 for x1 failover - T186320
Change 431568 merged by Marostegui:
[operations/puppet@production] mariadb: db1069 is now x1 master
Change 431566 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Promote db1069 to be x1 master
Change 431567 merged by jenkins-bot:
[operations/software@master] x1.hosts: db1069 is the new x1 master
Change 432358 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Prepare db1072 for stretch reimage and move it to m3
Change 432358 merged by Jcrespo:
[operations/puppet@production] mariadb: Prepare db1072 for stretch reimage and move it to m3
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db1072.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201805101539_jynus_4853.log.
Change 432409 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072
Change 432409 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully pool db1123, remove db1072
Change 432552 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1066 from s1 to an s2 master candidate
Change 432554 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbhosts: Move db1066 from s1 to s2
Change 432554 merged by Jcrespo:
[operations/software@master] dbhosts: Move db1066 from s1 to s2
Change 432552 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1066 from s1 to an s2 master candidate
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db1066.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201805110916_jynus_29532.log.
Change 432560 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1076 for maintenance
Change 432560 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1076 for maintenance
Change 432565 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Productionize db1066 after reimage
Change 432566 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1066 with low load after reimage
Change 432565 merged by Jcrespo:
[operations/puppet@production] mariadb: Productionize db1066 after reimage
Change 432566 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1066 with low load after reimage
Change 432575 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Allow reimage of db107* hosts
Change 432575 merged by Jcrespo:
[operations/puppet@production] mariadb: Allow reimage of db107* hosts
Change 432577 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Revert db recipe for all databases
Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db1076.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201805111305_jynus_8054.log.
Change 432577 merged by Jcrespo:
[operations/puppet@production] install_server: Revert db recipe for all databases
Change 432581 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1076 with low load, increase db1066 load
Change 432581 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1076 with low load, increase db1066 load
Change 432602 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1076 back with full weight
Change 432602 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Pool db1076 back with full weight
Change 434920 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Remove old references to db105* hosts at dns
Change 434920 merged by Jcrespo:
[operations/dns@master] mariadb: Remove old references to db105* and codfw hosts at dns
Change 437703 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Failover m2-master to db1065
Change 437707 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Failover m3-master to db1072
Change 437710 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Update misc replica CNAME for m2 and m3
Change 437714 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m2-master to db1065
Change 437715 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m3-master to db1072
Change 437703 merged by Jcrespo:
[operations/puppet@production] mariadb: Failover m2-master to db1065
Change 437714 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m2-master to db1065
Change 437714 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover m2-master to db1065
Change 437769 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Remove db1051, to be decommissioned, add db1065
Change 437769 merged by Jcrespo:
[operations/software@master] mariadb: Remove db1051, to be decommissioned, add db1065
Change 437710 merged by Jcrespo:
[operations/dns@master] mariadb: Update misc replica CNAME for m2 and m3
Change 437707 merged by Marostegui:
[operations/puppet@production] mariadb: Failover m3-master to db1072
Change 437715 merged by Marostegui:
[operations/puppet@production] mariadb: Switchover m3-master to db1072
db1054 is pending wait everything is working as expected on s2- db1052 process has not started yet. All others are ready for robh/dcops to continue as noted on the subtasks.
db1054 is now handed over to DCOps for decommissioning.
The only pending host is db1052 (s1 primary master) which is scheduled to be failed over the 18th July (T197069)
This is essentially all done - all the hosts have been handed over to DCOps for the last steps of the decommissioning process.
All these hosts have now been fully decommissioned
Thanks @RobH a @Cmjohnson for all the help!