Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | faidon | T155875 asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues | |||
Resolved | None | T155999 DBA plan to mitigate asw-c2-eqiad reboots | |||
Resolved | • Marostegui | T156008 Switchover s1 master db1057 -> db1052 |
Event Timeline
Change 333970 had a related patch set uploaded (by Jcrespo):
mariadb: Set binlog_format to STATEMENT for db1052
I have upgraded all packages except wmf-mariadb10 and restarted the server for kernel update.
Change 334008 had a related patch set uploaded (by Jcrespo):
mariadb: Repool db1052 after maintenance
Change 334030 had a related patch set uploaded (by Marostegui):
site.pp: Change active master for enwiki
Change 334242 had a related patch set uploaded (by Marostegui):
db-eqiad.php: Change s1 master
Change 334243 had a related patch set uploaded (by Marostegui):
db-eqiad.php: Depool db1052
Mentioned in SAL (#wikimedia-operations) [2017-01-26T06:49:45Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1052 - T156008 (duration: 00m 31s)
Mentioned in SAL (#wikimedia-operations) [2017-01-26T07:32:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change s1 master to db1057 - T156008 (duration: 00m 20s)
This has happened already.
Times in UTC:
Preparation of all the code, topology changes etc: 06:30-07:30
read only on: 07:30:40
do all the necessary checks to make sure we were good
start to deploy mediawiki config change: 07:32:34
finished deploying mediawiki config change: 07:32:54
Total read only time: 02:24.
We are now going to start all the clean up work.
Thanks @Joe and @Volans for helping out!
If anyone see something wrong, please let us know.
Mentioned in SAL (#wikimedia-operations) [2017-01-26T08:48:57Z] <marostegui> Change db1069 to replicate from the new s1 master db1052 - T156008
Mentioned in SAL (#wikimedia-operations) [2017-01-26T08:57:41Z] <marostegui> Change db1047 to replicate from the new s1 master db1052 - T156008
Mentioned in SAL (#wikimedia-operations) [2017-01-26T09:04:31Z] <marostegui> Change dbstore1002 to replicate from the new s1 master db1052 - T156008
recap of the cleanup work:
dns changed for s1-master.eqiad.wmnet
multisource slaves changed (only pending dbstore1001): db1047, db1069,dbstore1002
replication db1057 -> db1052 cleaned up
gtid enabled on db1057
Pending:
change dbstore1001 to replicate from db1052 once it caught up
enable semisync on db1052
disable db1057 as true on site.pp?
anything else you can see @jcrespo?
Change 334254 had a related patch set uploaded (by Marostegui):
s1.hosts: db1052 is the new master
Mentioned in SAL (#wikimedia-operations) [2017-01-26T09:39:33Z] <marostegui> Enable semi-sync replication on db1052 (s1 master) - T156008
Change 334256 had a related patch set uploaded (by Jcrespo):
mariadb: Move db1057 to be a regular slave on config after switch
Change 334256 merged by Jcrespo:
mariadb: Move db1057 to be a regular slave on config after switch
Mentioned in SAL (#wikimedia-operations) [2017-01-26T09:54:59Z] <marostegui> Disable semi-sync on db1057 old s1 master - https://phabricator.wikimedia.org/T156008
Change 334259 had a related patch set uploaded (by Jcrespo):
prometheus-mysql-exporter: Change db1052 to be s1-master
Change 334259 merged by Jcrespo:
prometheus-mysql-exporter: Change db1052 to be s1-master
I chhanged the master of dbstore1001. Resolving now, but let's monitor dbstore1001 to make sure nothing broke (because its delayed replication it may not alert immediately).
Mentioned in SAL (#wikimedia-operations) [2017-02-01T14:25:28Z] <jynus> dropping and replacing events on db1057 - db1052 T156008