Page MenuHomePhabricator

Switchover m1 master from db1016 to db1063
Closed, ResolvedPublic

Description

List of databases:

root@neodymium:~$ mysql -BN -h db1016.eqiad.wmnet --skip-ssl -e "SHOW DATABASES"
bacula
etherpadlite
heartbeat
information_schema
librenms
mysql
percona
performance_schema
puppet
racktables
rddmarc
rt
root@neodymium:~$ mysql -BN -h db1001.eqiad.wmnet -e "SHOW DATABASES"             
bacula
etherpadlite
heartbeat
information_schema
librenms
mysql
percona
performance_schema
puppet
racktables
rddmarc
rt
root@neodymium:~$ mysql -BN -h db1063.eqiad.wmnet -e "SHOW DATABASES"
bacula
etherpadlite
heartbeat
information_schema
librenms
mysql
percona
performance_schema
puppet
racktables
rddmarc
rt
root@neodymium:~$ mysql -BN -h db2078.codfw.wmnet -e "SHOW DATABASES"
bacula
etherpadlite
heartbeat
information_schema
librenms
mysql
percona
performance_schema
puppet
racktables
rddmarc
rt

Event Timeline

jcrespo triaged this task as Normal priority.Mar 14 2018, 9:45 AM
jcrespo created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMar 14 2018, 9:45 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jcrespo removed Marostegui as the assignee of this task.Mar 14 2018, 9:46 AM
jcrespo renamed this task from Switchover m1 master to a newer host to Switchover m1 master from db1016 to db1063.Mar 14 2018, 10:29 AM

Change 419392 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbproxy: Reenable firewall on proxies for m1 and m2 with holes

https://gerrit.wikimedia.org/r/419392

Change 419392 merged by Jcrespo:
[operations/puppet@production] dbproxy: Reenable firewall on passive m1 & m2 proxies with holes

https://gerrit.wikimedia.org/r/419392

Marostegui moved this task from Triage to In progress on the DBA board.Mar 14 2018, 3:18 PM

Change 419456 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbproxy: Enable firewall on the active m1 and m2 proxies

https://gerrit.wikimedia.org/r/419456

Change 419456 merged by Jcrespo:
[operations/puppet@production] dbproxy: Enable firewall on the active m1 and m2 proxies

https://gerrit.wikimedia.org/r/419456

Change 419685 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbproxy: Allow exim hosts to connect to misc proxies

https://gerrit.wikimedia.org/r/419685

Change 419685 merged by Jcrespo:
[operations/puppet@production] dbproxy: Allow exim hosts to connect to misc proxies

https://gerrit.wikimedia.org/r/419685

@akosiaris we are planning to suggest in the meeting today: tomorrow Tuesday at 16:00UTC, would that work for you?

Yes, that's fine.

Awesome! Thanks!
We will mention it on the meeting today then, and we'll see what we get :)

Change 420317 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbproxy: Update m1 proxies to point to db1063 as the primary host

https://gerrit.wikimedia.org/r/420317

Change 420318 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover m1 master from db1016 to db1063

https://gerrit.wikimedia.org/r/420318

Change 420317 merged by Jcrespo:
[operations/puppet@production] dbproxy: Update m1 proxies to point to db1063 as the primary host

https://gerrit.wikimedia.org/r/420317

Change 420318 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover m1 master from db1016 to db1063

https://gerrit.wikimedia.org/r/420318

Mentioned in SAL (#wikimedia-operations) [2018-03-20T16:14:47Z] <akosiaris> restart etherpad T189655

Mentioned in SAL (#wikimedia-operations) [2018-03-20T16:16:40Z] <akosiaris> restart bacula-dir T189655

Mentioned in SAL (#wikimedia-operations) [2018-03-20T16:38:50Z] <jynus> running reset slave all on db1063 T189655

Change 420765 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Update m1 master tag on prometheus

https://gerrit.wikimedia.org/r/420765

Change 420765 merged by Jcrespo:
[operations/puppet@production] mariadb: Update m1 master tag on prometheus

https://gerrit.wikimedia.org/r/420765

Change 420770 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dblists: Set the new m1 master (db1063) at the botton

https://gerrit.wikimedia.org/r/420770

Change 420770 merged by Jcrespo:
[operations/software@master] dblists: Set the new m1 master (db1063) at the botton

https://gerrit.wikimedia.org/r/420770

jcrespo closed this task as Resolved.Mar 20 2018, 5:12 PM
jcrespo claimed this task.

Considered done:

== m1 == 
* Disable GTID on db1063, connect db2078 and db1001 to db1063  DONE
* Disable puppet @db1016, puppet @db1063 DONE
puppet agent --disable "switchover to db1063"
* Merge gerrit: https://gerrit.wikimedia.org/r/420317 and https://gerrit.wikimedia.org/r/420318  DONE
* Run puppet and check config on dbproxy1001 and dbproxy1006 DONE
puppet agent -tv && cat /etc/haproxy/conf.d/db-master.cfg DONE
* Disable heartbeat @db1016 DONE
killall perl DONE
* Set old m1 master in read only DONE
mysql --skip-ssl -hdb1016 -e "SET GLOBAL read_only=1"
* Confirm new master has catched up DONE
mysql --skip-ssl -hdb1016 -e "select @@hostname; show master status\G show slave status\G"; mysql --skip-ssl -hdb1063 -e "select @@hostname; show master status\G show slave qstatus\G"
* Start puppet on db1063 (for heartbeat)
puppet agent -tv
* Switchover proxy master @dbproxy1001 and dbproxy1006 DONE
systemctl reload haproxy && echo "show stat" | socat /run/haproxy/haproxy.sock stdio DONE
* kill connectionsDONE
? which command is used- it would be nice to document it and put everything on the wiki
* Run puppet on old master @db1016 DONE
puppet agent -tv
* Set new master as read-write and stop slave DONE
mysql -h db1063.eqiad.wmnet -e "SET GLOBAL read_only=0; STOP SLAVE;"
* Check services affected at https://phabricator.wikimedia.org/T189655 DONE, needs more time
* REset SLAVE ALL on new master DONE
* Change old master to replicate from new master DONE
* Update tendril master server id for m1 (no need to change dns) DONE
* Patch prometheus, dblists DONE
* Create decommissioning ticket for db1016 - https://phabricator.wikimedia.org/T190179
* Close T166344

== m1 coords ==
mysql --skip-ssl -hdb1016 -e "select @@hostname; show master status\G show slave status\G"; mysql --skip-ssl -hdb1063 -e "select @@hostname; show master status\G show slave status\G" 
* db1016
            File: db1016-bin.013142
        Position: 435152054

* db1063
            File: db1063-bin.000076
        Position: 70931215