Page MenuHomePhabricator

Move masters away from codfw C6
Closed, ResolvedPublic

Description

The following masters are all in C6 and should be moved away to different rack/rows.
This is a proposal of destination rows.

  • db2048 -> A1
  • db2035 -> B1
  • db2039 -> D1
  • db2040 -> A3
  • db2045 -> B3
  • db2042 (misc m3) -> D3

@Papaul can you confirm those destination racks can get those servers?
Confirmed by  @Papaul

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Let's go for db2035 if that works for you!

Papaul added a comment.Apr 2 2018, 3:25 PM

new IP address 10.192.16.73

Thanks! I will post here as soon as the server is off

Papaul added a comment.Apr 2 2018, 3:26 PM

new switch port information
asw-b1-codfw ge-1/0/15

Mentioned in SAL (#wikimedia-operations) [2018-04-02T15:28:14Z] <marostegui> Stop MySQL and power off db2035 (s2 codfw master - this will stop replication on s2 codfw slaves) for rack change - T191193

@Papaul db2035 is now off!

Change 423484 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2035 IP

https://gerrit.wikimedia.org/r/423484

Change 423485 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add db2035 to private1-b-codfw was in private1-c-codfw

https://gerrit.wikimedia.org/r/423485

Change 423484 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2035 IP

https://gerrit.wikimedia.org/r/423484

Mentioned in SAL (#wikimedia-operations) [2018-04-02T15:40:49Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Change db2035 IP - T191193 (duration: 01m 15s)

Change 423485 merged by Marostegui:
[operations/dns@master] DNS: Add db2035 to private1-b-codfw was in private1-c-codfw

https://gerrit.wikimedia.org/r/423485

Papaul added a comment.Apr 2 2018, 3:41 PM

old switch information
asw-c6-codfw ge-6/0/2

Mentioned in SAL (#wikimedia-operations) [2018-04-02T15:42:09Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change db2035 IP - T191193 (duration: 01m 15s)

mediawiki config files changed
network/interfaces changed
dns merged and deployed

Papaul added a comment.Apr 2 2018, 3:46 PM

@RobH if switch configuration is not done yet can you please change it from

new switch port information
asw-b1-codfw ge-1/0/15

to
new switch port information
asw-b1-codfw ge-1/0/4

thanks

Papaul added a comment.Apr 2 2018, 3:55 PM

db2035 was on asw-c6-codfw ge-6/0/2 and now will be on asw-b1-codfw ge-1/0/4

Papaul added a comment.Apr 2 2018, 4:48 PM

moved db2035 in racktables from C6 to B1

Marostegui updated the task description. (Show Details)Apr 2 2018, 4:50 PM
Papaul updated the task description. (Show Details)Apr 2 2018, 4:50 PM

db2035's mysql is back and slaves are reconnecting.
I would suggest next server to be db2039.

Papaul added a comment.Apr 2 2018, 5:24 PM

switch port information when ready to move db2039. This i just a note for when we are ready to do the move.

db2039 was on asw-c6-codfw ge-6/0/6 and now will be on asw-d1-codfw ge-1/0/14 when

new ip address will be :
10.192.48.114

Mentioned in SAL (#wikimedia-operations) [2018-04-03T05:18:04Z] <marostegui> Enable back gtid on db2035 - T191193

switch port information when ready to move db2039. This i just a note for when we are ready to do the move.

db2039 was on asw-c6-codfw ge-6/0/6 and now will be on asw-d1-codfw ge-1/0/14 when

new ip address will be :
10.192.48.114

Thanks @Papaul - let me know a day that works for you! cc @RobH

Marostegui moved this task from Next to In progress on the DBA board.Apr 4 2018, 8:40 AM

@RobH can you let us know when the switch is ready so we can move db2039?
Thanks!

RobH added a comment.Apr 5 2018, 3:00 PM

I've gone ahead and enabled asw-d1-codfw ge-1/0/14, and left asw-c6-codfw ge-6/0/6 online for now.

Once the system is fully moved, we'll remove the port info from the old port.

Change 424329 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove db2039 from private1-c-codfw and place it in private1-d-codfw

https://gerrit.wikimedia.org/r/424329

Mentioned in SAL (#wikimedia-operations) [2018-04-05T15:17:48Z] <jynus> stopping mariadb on db2039 T191193

Change 424335 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2039 IP

https://gerrit.wikimedia.org/r/424335

Change 424329 merged by Marostegui:
[operations/dns@master] DNS: Remove db2039 from private1-c-codfw and place it in private1-d-codfw

https://gerrit.wikimedia.org/r/424329

Change 424335 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2039 IP

https://gerrit.wikimedia.org/r/424335

Mentioned in SAL (#wikimedia-operations) [2018-04-05T15:34:12Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change db2039 IP as it is being moved to a different rack - T191193 (duration: 01m 17s)

Mentioned in SAL (#wikimedia-operations) [2018-04-05T15:35:38Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Change db2039 IP as it is being moved to a different rack - T191193 (duration: 01m 17s)

Papaul added a comment.Apr 5 2018, 3:39 PM

Racktables update. moved db2039 from C6 to D1

Papaul added a comment.Apr 5 2018, 3:40 PM

Please update the task with the next server to move so I can can the rack ready. Thanks

Please update the task with the next server to move so I can can the rack ready. Thanks

let's go for db2040 as next host
Thanks!

Papaul updated the task description. (Show Details)Apr 5 2018, 3:46 PM
Papaul added a comment.Apr 5 2018, 3:54 PM

switch port information when ready to move db2040.

db2040 was on asw-c6-codfw ge-6/0/7 and now will be on asw-a3-codfw ge-3/0/ 27

new ip address will be :
10.192.0.39

Let me know if you want to do this today.

Thanks

switch port information when ready to move db2040.

db2040 was on asw-c6-codfw ge-6/0/7 and now will be on asw-a3-codfw ge-3/0/ 27

new ip address will be :
10.192.0.39

Let me know if you want to do this today.

Thanks

Let's wait till next week, I don't want to do many master changes on a single day. Let's go for Tuesday next week? @RobH ?

RobH added a comment.Apr 10 2018, 2:45 PM

[edit interfaces interface-range vlan-private1-a-codfw]

member xe-2/0/0 { ... }

+ member ge-3/0/27;
[edit interfaces ge-3/0/27]
+ description db2040;

  • disable;

+ enable;

port now live for the db2040 move. Once it has moved, please update this task so the old port (in row C) can be disabled/deactivated.)

Mentioned in SAL (#wikimedia-operations) [2018-04-10T14:46:20Z] <marostegui> Stop MySQL on db2040 for server move - T191193

Change 425279 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: move db2040 from private1-c-codfw to private1-a-codfw

https://gerrit.wikimedia.org/r/425279

Change 425279 merged by Marostegui:
[operations/dns@master] DNS: move db2040 from private1-c-codfw to private1-a-codfw

https://gerrit.wikimedia.org/r/425279

Change 425285 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2040 IP

https://gerrit.wikimedia.org/r/425285

Change 425285 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2040 IP

https://gerrit.wikimedia.org/r/425285

Mentioned in SAL (#wikimedia-operations) [2018-04-10T15:21:42Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Change db2040 IP as it is being moved to another rack - T191193 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-04-10T15:22:52Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change db2040 IP as it is being moved to another rack - T191193 (duration: 00m 59s)

Move db2040 from C6 to A3 in racktables
Please advice what is the next server

Papaul updated the task description. (Show Details)Apr 10 2018, 3:31 PM

switch port information when ready to move db2045.

db2045 was on asw-c6-codfw ge-6/0/14 and now will be on asw-b3-codfw ge-3/0/ 20

new ip address will be :
10.192.16.74

Change 425298 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2045 IP

https://gerrit.wikimedia.org/r/425298

Mentioned in SAL (#wikimedia-operations) [2018-04-10T16:11:07Z] <marostegui> Stop MySQL on db2045 (s8 codfw master) to move it to another rack, this will break replication on codfw - T191193

Change 425303 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: move db2045 from private1-c-codfw to private1-b-codfw

https://gerrit.wikimedia.org/r/425303

Change 425303 merged by Marostegui:
[operations/dns@master] DNS: move db2045 from private1-c-codfw to private1-b-codfw

https://gerrit.wikimedia.org/r/425303

Change 425298 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2045 IP

https://gerrit.wikimedia.org/r/425298

Mentioned in SAL (#wikimedia-operations) [2018-04-10T16:25:00Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change db2045 IP as it is being moved to another rack - T191193 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-04-10T16:26:08Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Change db2045 IP as it is being moved to another rack - T191193 (duration: 00m 59s)

Marostegui updated the task description. (Show Details)Apr 10 2018, 5:15 PM

moved db2045 from C6 to B3 in racktables

Please update task with next server we need to move next week.

thanks

@Papaul next one will be db2042
Thanks!

Papaul triaged this task as Normal priority.Apr 12 2018, 2:01 PM

switch port information when ready to move db2042.

db2042 was on asw-c6-codfw ge-6/0/9 and now will be on asw-d3-codfw ge-3/0/ 10

new ip address will be :
10.192.48.115

@ayounsi can you configure asw-d3-codfw ge-3/0/ 10 for us?
We want to move db2042 to that port

Thanks!

asw-d-codfw-ge-3/0/10 now in private1-d-codfw.

Let me know when to disable asw-c6-codfw:ge-6/0/9

Change 427136 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Move db2042 fron private1-c-codfw to private1-d-codfw

https://gerrit.wikimedia.org/r/427136

Mentioned in SAL (#wikimedia-operations) [2018-04-17T14:52:59Z] <marostegui> Stop MySQL on db2042 to move it to another rack - https://phabricator.wikimedia.org/T191193

Change 427136 merged by Marostegui:
[operations/dns@master] DNS: Move db2042 fron private1-c-codfw to private1-d-codfw

https://gerrit.wikimedia.org/r/427136

switch port information when ready to move db2048.

db2048 was on asw-c6-codfw ge-6/0/17 and now will be on asw-a1-codfw ge-1/0/0

new ip address will be :
10.192.0.99

Papaul updated the task description. (Show Details)Apr 17 2018, 3:19 PM

Moved db2042 from c6 to d3 in racktables

Change 427150 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2048 IP

https://gerrit.wikimedia.org/r/427150

Mentioned in SAL (#wikimedia-operations) [2018-04-17T15:23:37Z] <marostegui> Stop MySQL on db2048 for rack movement - T191193

Change 427151 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Move db2048 from prvate1-c-odfw to private1-a-codfw

https://gerrit.wikimedia.org/r/427151

Change 427151 merged by Marostegui:
[operations/dns@master] DNS: Move db2048 from prvate1-c-odfw to private1-a-codfw

https://gerrit.wikimedia.org/r/427151

Change 427150 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db2048 IP

https://gerrit.wikimedia.org/r/427150

ayounsi added a comment.EditedApr 17 2018, 3:31 PM

switch port information when ready to move db2048.
db2048 was on asw-c6-codfw ge-6/0/17 and now will be on asw-a1-codfw ge-1/0/0

asw-a1-codfw ge-1/0/0 enabled and in private1-a-codfw

Let me know when to cleanup the old port.

Mentioned in SAL (#wikimedia-operations) [2018-04-17T15:32:42Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Change db2048 IP - T191193 (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2018-04-17T15:33:46Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Change db2048 IP - T191193 (duration: 00m 58s)

Papaul updated the task description. (Show Details)Apr 17 2018, 3:46 PM
Papaul reassigned this task from Papaul to Marostegui.Apr 17 2018, 3:49 PM

Moved db2048 from C6 to A1 in racktables

@Marostegui assigning the tasks back to you if you think everything looks good you can close.

Thanks

Marostegui reassigned this task from Marostegui to ayounsi.Apr 17 2018, 3:54 PM

Thanks @Papaul!!
I have talked to @ayounsi and he will clean up the ports and close the task when ready

Marostegui moved this task from In progress to Done on the DBA board.Apr 17 2018, 4:15 PM
ayounsi closed this task as Resolved.EditedApr 17 2018, 4:53 PM

asw-a1-codfw ge-1/0/0 cleaned up
asw-c6-codfw ge-6/0/9 cleaned up

EDIT, wrong port:
asw-a1-codfw ge-1/0/0 rolledback
asw-c6-codfw ge-6/0/17 cleaned up

jcrespo reopened this task as Open.Apr 17 2018, 9:39 PM

@jcrespo what do you feel it is being missed?

Were the right interfaces disabled after the revert?

Were the right interfaces disabled after the revert?

Yeah:

asw-c6-codfw ge-6/0/17 cleaned up

That was the right one to clean up

jcrespo closed this task as Resolved.Apr 18 2018, 7:02 AM

Okey, I feel we should check what went wrong (was it the clarity of the communication, was it a one-time mistake that will unlikely happen again, was it the extended downtime on icinga that made the issue not beeing immediately apparent)?

For example, as a procedure, could activity be checked on the port before being disabled to check the host is down/moved away?

For example, as a procedure, could activity be checked on the port before being disabled to check the host is down/moved away?

I thought that was already done. But maybe it was missed this time.
I think it was a combination of all:

  • Confusing the new port with the old port
  • Extended downtime didn't make the issue obvious