Page MenuHomePhabricator

Failover m5 master from db1009 to db1073
Closed, ResolvedPublic

Description

Hello,

We would like to failover m5 master from db1009 to db1073.
These are the steps, so far:

Can we have someone from the cloud-services-team around for the failover? As applications might need to be restarted if they don't gracefully start connecting to the new master.
The date is yet to be arranged but from the DBA side we are ready

Event Timeline

Marostegui triaged this task as Normal priority.Mar 6 2018, 2:04 PM
Marostegui created this task.
Marostegui moved this task from Triage to In progress on the DBA board.
chasemp assigned this task to Andrew.Mar 6 2018, 4:36 PM
chasemp added a subscriber: chasemp.

@Andrew volunteered to help steer this change :)

@Andrew which day/time would work for you to get this done?

Change 416912 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2037: Enable notifications

https://gerrit.wikimedia.org/r/416912

Change 416912 merged by Marostegui:
[operations/puppet@production] db2037: Enable notifications

https://gerrit.wikimedia.org/r/416912

Andrew added a comment.Mar 7 2018, 4:54 PM

@Marostegui, I could do it tomorrow or Friday anytime after 15:00 UTC. Next week I'm out Monday, Tuesday, Wednesday.

@Marostegui, I could do it tomorrow or Friday anytime after 15:00 UTC. Next week I'm out Monday, Tuesday, Wednesday.

Let's try tomorrow at 15:30 UTC maybe? //cc @jcrespo

Marostegui updated the task description. (Show Details)Mar 8 2018, 1:25 PM
Marostegui updated the task description. (Show Details)Mar 8 2018, 1:31 PM

Change 417254 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1009.yaml: Disable notifications

https://gerrit.wikimedia.org/r/417254

Change 417254 merged by Marostegui:
[operations/puppet@production] db1009.yaml: Disable notifications

https://gerrit.wikimedia.org/r/417254

Marostegui updated the task description. (Show Details)Mar 8 2018, 2:06 PM

Mentioned in SAL (#wikimedia-operations) [2018-03-08T15:00:08Z] <marostegui> Change topology in m5, db2037 to become a slave of db1073 - T189005

Marostegui updated the task description. (Show Details)Mar 8 2018, 3:00 PM

Mentioned in SAL (#wikimedia-operations) [2018-03-08T15:01:21Z] <marostegui> Disable puppet on db1073 - T189005

Marostegui updated the task description. (Show Details)Mar 8 2018, 3:04 PM

Change 417290 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/mediawiki-config@master] wikitech: change the IP address for m5-master

https://gerrit.wikimedia.org/r/417290

Andrew updated the task description. (Show Details)Mar 8 2018, 3:15 PM

Mentioned in SAL (#wikimedia-operations) [2018-03-08T15:17:50Z] <andrewbogott> silencing nova and other openstack alerts in anticipation of service interruptions for https://phabricator.wikimedia.org/T189005

Andrew updated the task description. (Show Details)Mar 8 2018, 3:22 PM
Marostegui updated the task description. (Show Details)Mar 8 2018, 3:27 PM
Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-03-08T15:30:40Z] <marostegui> Set m5 master db1009 read only for the failover - T189005

Andrew updated the task description. (Show Details)Mar 8 2018, 3:30 PM
Marostegui updated the task description. (Show Details)Mar 8 2018, 3:32 PM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Mar 8 2018, 3:34 PM

Change 417290 merged by jenkins-bot:
[operations/mediawiki-config@master] wikitech: change the IP address for m5-master

https://gerrit.wikimedia.org/r/417290

Marostegui updated the task description. (Show Details)Mar 8 2018, 3:38 PM
Andrew updated the task description. (Show Details)Mar 8 2018, 3:39 PM

In real time it was realized the ACL from labs-hosts VLAN was blocking access to the new m5 backing DB.

commit comment "T189005 nova database has moved to 10.64.16.79"
commit comment "T189005 nova database slave 10.192.32.8"

Setup on both cr1-eqiad and cr2-eqiad

Andrew updated the task description. (Show Details)Mar 8 2018, 4:07 PM
Marostegui updated the task description. (Show Details)Mar 8 2018, 5:04 PM
Marostegui closed this task as Resolved.Mar 9 2018, 6:44 AM

I am going to close this as resolved as nothing has come up.
We will follow up the decommission of db1009 on T189216 after a few days once we make sure everything is fine.