Page MenuHomePhabricator

Move masters away from D1 in eqiad
Closed, ResolvedPublic

Description

Hello,

During all the switchovers (T162133) needed to be able to decomission old hosts (T134476) we realised that 4 of the new masters are currently living in D1, which is not ideal.
So we were wondering if Thursday/Friday we could move some out from it.

We are aware of the short notice, sorry for that.

Maybe we can do this move if @Cmjohnson has time and these locations are fine

db1061 -> C3
db1062 -> D4
db1063 -> C5

Let us know if you can do this and if the racks are fine.
Sorry again for the short notice

Details

Related Gerrit Patches:
operations/mediawiki-config : masterdb-eqiad,db-codfw.php: Change db1063 IP and rack
operations/dns : masterRemove db1061 old IP
operations/mediawiki-config : masterdb-eqiad.php: Change db1062 location
operations/mediawiki-config : masterdb-eqiad,db-codfw.php: Change db1061 IP

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptApr 26 2017, 1:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Marostegui I will be in the data center Friday 4/27 at 0930. Let's get this take care of right away.

jcrespo added a subscriber: jcrespo.EditedApr 27 2017, 8:20 PM

Cmjohnson- we really appreciate the effort- we know these days you have lots and lots of work!

Thanks Chris!!!
@jcrespo, this means reconfigure the slaves as the masters will change IPs...

Marostegui added a comment.EditedApr 27 2017, 8:41 PM

Thanks Chris!!!
@jcrespo, this means reconfigure the slaves as the masters will change IPs...

Nevermind, just remembered we replicate from fqdn and not IPs :)

jcrespo moved this task from Triage to Next on the DBA board.Apr 27 2017, 9:34 PM

Nevermind, just remembered we replicate from fqdn and not IPs :)

But mediawiki uses IPs.

Yes but that's only changing db-eqiad and db-codfw as we normally do when moving a server, as far as I know.

jcrespo renamed this task from Move masters away from D1 in eqiad? to Move masters away from D1 in eqiad.Apr 28 2017, 10:22 AM

I have downtimed all the slaves in s5,s6 and s7 for 10 hours.

Mentioned in SAL (#wikimedia-operations) [2017-04-28T13:26:29Z] <marostegui> Stop MySQL and shutdown db1062 - T163895

Mentioned in SAL (#wikimedia-operations) [2017-04-28T13:30:11Z] <marostegui> Stop MySQL and shutdown db1061 - T163895

Change 350836 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db1061 IP

https://gerrit.wikimedia.org/r/350836

Change 350837 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Changing production dns for db1061, moving to row C T163895

https://gerrit.wikimedia.org/r/350837

Change 350837 merged by Cmjohnson:
[operations/dns@master] Changing production dns for db1061, moving to row C T163895

https://gerrit.wikimedia.org/r/350837

Change 350836 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db1061 IP

https://gerrit.wikimedia.org/r/350836

Mentioned in SAL (#wikimedia-operations) [2017-04-28T13:44:58Z] <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Change db1061 IP - T163895 (duration: 01m 19s)

Mentioned in SAL (#wikimedia-operations) [2017-04-28T13:46:04Z] <marostegui@naos> Synchronized wmf-config/db-codfw.php: Change db1061 IP - T163895 (duration: 01m 00s)

Marostegui updated the task description. (Show Details)Apr 28 2017, 1:52 PM

Change 350844 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Change db1062 location

https://gerrit.wikimedia.org/r/350844

Change 350844 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Change db1062 location

https://gerrit.wikimedia.org/r/350844

Mentioned in SAL (#wikimedia-operations) [2017-04-28T14:04:00Z] <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Change db1062 rack location - T163895 (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2017-04-28T14:04:41Z] <marostegui> Stop and shutdown db1063 - T163895

Change 350850 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] Remove db1061 old IP

https://gerrit.wikimedia.org/r/350850

Change 350850 merged by Marostegui:
[operations/dns@master] Remove db1061 old IP

https://gerrit.wikimedia.org/r/350850

Change 350851 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] adding dns entry for db1063 relocating to row C T163895

https://gerrit.wikimedia.org/r/350851

Change 350851 merged by Cmjohnson:
[operations/dns@master] adding dns entry for db1063 relocating to row C T163895

https://gerrit.wikimedia.org/r/350851

Change 350853 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db1063 IP and rack

https://gerrit.wikimedia.org/r/350853

Change 350853 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Change db1063 IP and rack

https://gerrit.wikimedia.org/r/350853

Mentioned in SAL (#wikimedia-operations) [2017-04-28T14:31:21Z] <marostegui@naos> Synchronized wmf-config/db-codfw.php: Change db1063 IP and rack - T163895 (duration: 00m 50s)

Mentioned in SAL (#wikimedia-operations) [2017-04-28T14:32:15Z] <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Change db1063 IP and rack - T163895 (duration: 00m 48s)

Marostegui closed this task as Resolved.Apr 28 2017, 2:45 PM
Marostegui assigned this task to Cmjohnson.

This has all been completed.
The masters have now slaves connected to them again:

root@neodymium:/home/marostegui/git/software/dbtools# for i in db1061 db1062 db1063; do echo $i; mysql --skip-ssl -h$i -e "show slave hosts;";done
db1061
+------------+------+------+-----------+
| Server_id  | Host | Port | Master_id |
+------------+------+------+-----------+
|  171978769 |      | 3306 | 171978766 |
| 4294967295 |      | 3316 | 171978766 |
|  171970663 |      | 3306 | 171978766 |
|  171970572 |      | 3306 | 171978766 |
|  171974770 |      | 3306 | 171978766 |
|  171978770 |      | 3306 | 171978766 |
|  171970579 |      | 3306 | 171978766 |
|  171970705 |      | 3306 | 171978766 |
|  171978904 |      | 3306 | 171978766 |
|  171970586 |      | 3306 | 171978766 |
+------------+------+------+-----------+
db1062
+------------+------+------+-----------+
| Server_id  | Host | Port | Master_id |
+------------+------+------+-----------+
|  171970577 |      | 3306 | 171978767 |
|  171978770 |      | 3306 | 171978767 |
|  171970583 |      | 3306 | 171978767 |
|  171970588 |      | 3306 | 171978767 |
| 4294967295 |      | 3317 | 171978767 |
|  171970590 |      | 3306 | 171978767 |
|  171966555 |      | 3306 | 171978767 |
|  171970664 |      | 3306 | 171978767 |
|  171978769 |      | 3306 | 171978767 |
|  171978905 |      | 3306 | 171978767 |
|  171970582 |      | 3306 | 171978767 |
+------------+------+------+-----------+
db1063
+------------+------+------+-----------+
| Server_id  | Host | Port | Master_id |
+------------+------+------+-----------+
|  171978769 |      | 3306 | 171978768 |
|  171970704 |      | 3306 | 171978768 |
|  171970575 |      | 3306 | 171978768 |
|  171978903 |      | 3306 | 171978768 |
| 4294967295 |      | 3315 | 171978768 |
|  171978770 |      | 3306 | 171978768 |
|  171978778 |      | 3306 | 171978768 |
|  171974769 |      | 3306 | 171978768 |
|  171978777 |      | 3306 | 171978768 |
+------------+------+------+-----------+

The DNS is still being propagated and tendril, for instance, has not updated all the IPs yet.

Thanks a lot @Cmjohnson for attending and helping out with this last minute request. You have saved us from a lot future pain by having all of them in the same rack!