Page MenuHomePhabricator

codfw: relocate servers in rack D6
Closed, ResolvedPublic

Description

I will like to relocate some servers in rack D6 to install some blanking panels for better air flow within the rack. Right now we have a lot of empty U spaces between servers because of some servers that we took out from decom. Please see below for detail information on the relocation. I am planning on doing this next week on the 7th at 10:00 am CT. If this works for you please check the confirmation box

completeserverOld U positionOld switch portNew U positionNew switch position
yesdb207427ge-6/0/108ge-6/0/7
yesdb208428ge-/6/0/219ge-6/0/8
yesdb210131ge-6/0/1210ge-6/0/9
yesdb213032ge-6/0/1811ge-6/0/10
yesdbproxy200443ge-6/0/2012ge-6/0/11

Event Timeline

Marostegui triaged this task as Medium priority.EditedDec 2 2021, 1:30 PM
Marostegui moved this task from Triage to Refine on the DBA board.
Marostegui added subscribers: Kormat, jcrespo, Marostegui.

Roles:

db2074 -> replica (sanitarium master)
db2078 -> replica (m1,m2,m3,5)
db2101 -> replica (backup source)
db2130 -> replica
dbproxy2004 -> m5 proxy (m5 in codfw isn't in use)

@Papaul from the DB side of things we are good. I can leave the hosts off for you the 7th. It looks like 10AM CT is 16:00 UTC so I will definitely not be online after this operation is finished, @Kormat could you take care of starting mysql and replication back the following day?
@jcrespo regarding the backup replica, does this day/time work for you?

Sadly, I won't be around on the 7th. There is no issue regarding the move (backups should have finished by that time, ip changes should not affect backups), but either the date has to be moved, or someone will have to stop the servers for me. I can put them up on the 9th when I return (no big issue with those being down for an extended time), but I won't be able to shut them down on the 7th.

I can stop it, no issue. But @Kormat will need to bring it back up the following day (or wait till 9th for you).

But @Kormat will need to bring it back up the following day (or wait till 9th for you).

Both will work. For shuwdown, the usual db procedure will work (minus the need for mw depool).

I can stop it, no issue. But @Kormat will need to bring it back up the following day (or wait till 9th for you).

Can do.

\o/

Cool, so @Papaul let's go ahead as you've initially planned it.

Mentioned in SAL (#wikimedia-operations) [2021-12-07T05:58:09Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2074 and db2130 T296930', diff saved to https://phabricator.wikimedia.org/P18033 and previous config saved to /var/cache/conftool/dbconfig/20211207-055808-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-12-07T07:16:03Z] <marostegui> power off db2074, db2078, db2101, db2130, dbproxy2004 T296930

All hosts are now down and powered off.
@Papaul you can proceed as needed.
@Kormat I have upgraded mysql on all hosts, so please run mysql_upgrade once you bring them back up (some of them were already running the latest version, so mysql_upgrade won't do anything). dbproxy2004 only requires starting and checking haproxy.

Mentioned in SAL (#wikimedia-operations) [2021-12-07T15:52:06Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Reracking T296930

Mentioned in SAL (#wikimedia-operations) [2021-12-07T15:52:10Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Reracking T296930

Papaul updated the task description. (Show Details)

@Marostegui @Kormat all the servers are back up online from my end.

Thanks for helping

I have repooled db2130 and db2074 as they were not pooled back.