Page MenuHomePhabricator

Switchover es2 master (es1011) to es1015
Closed, ResolvedPublic

Description

es2 master, es1011, is still running jessie/mariadb 10.1 and with a socket on the wrong location.

Switch es2 (cluster24) writes to es1015 on row C so it can be depooled and upgraded.

Event Timeline

jcrespo moved this task from Triage to In progress on the DBA board.

Change 454210 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool cluster24 (es2) from new writes

https://gerrit.wikimedia.org/r/454210

Change 454211 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Promote es1015 to es2 master and repool es2 for writes

https://gerrit.wikimedia.org/r/454211

Change 454214 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Promote es1015 to be the new es2 master instead of es1011

https://gerrit.wikimedia.org/r/454214

Change 454219 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Point es2-master to es1015 after master switchover

https://gerrit.wikimedia.org/r/454219

Change 454254 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool es1015 for maintenance

https://gerrit.wikimedia.org/r/454254

Change 454254 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool es1015 for maintenance

https://gerrit.wikimedia.org/r/454254

Mentioned in SAL (#wikimedia-operations) [2018-08-22T05:53:28Z] <marostegui> Start topology changes for es2 failover - T202364

Change 454214 merged by Jcrespo:
[operations/puppet@production] mariadb: Promote es1015 to be the new es2 master instead of es1011

https://gerrit.wikimedia.org/r/454214

Change 454210 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool cluster24 (es2) from new writes

https://gerrit.wikimedia.org/r/454210

Change 454211 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Promote es1015 to es2 master and repool es2 for writes

https://gerrit.wikimedia.org/r/454211

Change 454219 merged by Jcrespo:
[operations/dns@master] mariadb: Point es2-master to es1015 after master switchover

https://gerrit.wikimedia.org/r/454219

Change 454485 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool es1011 for reimage

https://gerrit.wikimedia.org/r/454485

Change 454487 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Allow manual reimage of es1011

https://gerrit.wikimedia.org/r/454487

Change 454485 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool es1011 for reimage

https://gerrit.wikimedia.org/r/454485

Change 454487 merged by Jcrespo:
[operations/puppet@production] install_server: Allow manual reimage of es1011

https://gerrit.wikimedia.org/r/454487

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['es1011.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808220922_jynus_24242.log.

Completed auto-reimage of hosts:

['es1011.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2018-08-23T06:31:11Z] <marostegui> Enable semi-sync on es2 - T202364

All the clean up tasks are now done

I have enabled semi-sync on es2:

root@es1015.eqiad.wmnet[(none)]> SHOW GLOBAL STATUS like 'Rpl_semi_sync_master_clients';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| Rpl_semi_sync_master_clients | 2     |
+------------------------------+-------+
1 row in set (0.05 sec)

GTID was enabled yesterday

Also dropped the non used grants on es1011.
Nice work with the very smooth failover. Probably the failover with less stress we've ever done! :)