Page MenuHomePhabricator

switchover es1014 to es1017
Closed, ResolvedPublic

Description

Change the active master of es3 shard from es1014 to es1017.

This will allow to:

  • Upgrade the es3 master to stretch/10.1
  • Finally change the socket location
  • Migrate the SPOF away from DC row B

Event Timeline

Vvjjkkii renamed this task from switchover es1014 to es1017 to 24aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.

es1017 has been successfully moved to row C.
Once it has been working fine for a few days (not the first time we see old hardware failing after a few days), we should schedule the failover.

We'd need to start thinking a date for this failover.

What do you think of doing this next wednesday?

Change 446551 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Allow reimage of es1019

https://gerrit.wikimedia.org/r/446551

Change 446553 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool es1018 for reimage

https://gerrit.wikimedia.org/r/446553

Change 446551 merged by Jcrespo:
[operations/puppet@production] mariadb: Allow reimage of es1019

https://gerrit.wikimedia.org/r/446551

Change 446553 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Depool es1019 for reimage

https://gerrit.wikimedia.org/r/446553

Change 446752 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool es1019 with low load

https://gerrit.wikimedia.org/r/446752

Change 446755 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool es1019 full after maintenance

https://gerrit.wikimedia.org/r/446755

Change 446752 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool es1019 with low load

https://gerrit.wikimedia.org/r/446752

Change 446827 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Productionize db1095 and db1102 into s1-test

https://gerrit.wikimedia.org/r/446827

Change 446827 merged by Jcrespo:
[operations/puppet@production] mariadb: Productionize db1095 and db1102 into test-s1

https://gerrit.wikimedia.org/r/446827

Change 446846 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1099 for cloning and upgrade

https://gerrit.wikimedia.org/r/446846

Change 446846 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Depool db1099 for cloning and upgrade

https://gerrit.wikimedia.org/r/446846

Change 446849 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Fully repool db1099, including db1099:s8

https://gerrit.wikimedia.org/r/446849

Change 446849 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Fully depool db1099, including db1099:s8

https://gerrit.wikimedia.org/r/446849

Change 446875 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1099 (both instances) with low load

https://gerrit.wikimedia.org/r/446875

Change 446875 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Repool db1099 (both instances) with low load

https://gerrit.wikimedia.org/r/446875

Change 446902 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1099 fully after warmup

https://gerrit.wikimedia.org/r/446902

Change 446755 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool es1019 fully after maintenance

https://gerrit.wikimedia.org/r/446755

Change 446902 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1099 fully after warmup

https://gerrit.wikimedia.org/r/446902

Change 447584 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Promote es1017 as the master of es3-eqiad (instead of es1014)

https://gerrit.wikimedia.org/r/447584

Change 447586 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Promote es1017 as the master of es3-eqiad

https://gerrit.wikimedia.org/r/447586

Change 447587 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] Setup es1017 as the backend for the es3-eqiad master

https://gerrit.wikimedia.org/r/447587

Change 447584 merged by Marostegui:
[operations/puppet@production] mariadb: Promote es1017 as the master of es3-eqiad (instead of es1014)

https://gerrit.wikimedia.org/r/447584

Change 447586 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Promote es1017 as the master of es3-eqiad

https://gerrit.wikimedia.org/r/447586

Congratulations @jcrespo for doing the first automated failover with the new script!
The errors lasted:
First error at 06:01:27 and last error 06:02:05

Very impressive! :)

Change 447587 merged by Jcrespo:
[operations/dns@master] Setup es1017 as the backend for the es3-eqiad master

https://gerrit.wikimedia.org/r/447587

jcrespo claimed this task.
jcrespo moved this task from Pending comment to In progress on the DBA board.

This is now done, and while there are things pending to do related to es1014 maintenance, the main task T183585 is unblocked from all SPOF db hosts.

Change 452637 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Failover db1066 (eqiad s2 master) to db1122

https://gerrit.wikimedia.org/r/452637

Change 452637 abandoned by Jcrespo:
mariadb: Failover db1066 (eqiad s2 master) to db1122

Reason:
Not needed.

https://gerrit.wikimedia.org/r/452637