Page MenuHomePhabricator

Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Wednesday Feb 21st 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-a8-codfw intlsw1-a8-codfw int
Data persistencedb2146ge-8/0/1move
Data persistencedb2106ge-8/0/3move
Service Opskubernetes2026ge-8/0/29drain
Service Opskubernetes2025ge-8/0/30drain
Service Opsparse2004ge-8/0/4depool
Service Opsparse2005ge-8/0/5depool

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Event Timeline

cmooney triaged this task as Medium priority.Jan 25 2024, 12:00 PM
cmooney created this task.

Mentioned in SAL (#wikimedia-operations) [2024-02-21T15:40:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T355874 - depooling db2146 db2106', diff saved to https://phabricator.wikimedia.org/P57579 and previous config saved to /var/cache/conftool/dbconfig/20240221-154056-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-21T15:41:13Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:25:00 on db2146.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-21T15:41:27Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on db2146.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-21T15:41:43Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:25:00 on db2106.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-21T15:42:02Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on db2106.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Icinga downtime and Alertmanager silence (ID=c42ddc7f-d7d7-4ebc-9852-d3a5c7882e71) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack a8

asw-a-codfw,cr[1-2]-codfw,lsw1-a8-codfw.mgmt

Icinga downtime and Alertmanager silence (ID=da675508-2cc3-4974-a4ca-677deefc2dff) set by cmooney@cumin1002 for 0:30:00 on 6 host(s) and their services with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw

db[2106,2146].codfw.wmnet,kubernetes[2025-2026].codfw.wmnet,parse[2004-2005].codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-02-21T16:02:00Z] <topranks> Commencing network maintenance migrating servers to new switch codfw rack A8 T355874

Mentioned in SAL (#wikimedia-operations) [2024-02-21T16:24:40Z] <claime> Repooling parse2004.codfw.wmnet parse2005.codfw.wmnet following codfw A8 network migration - T355874

Mentioned in SAL (#wikimedia-operations) [2024-02-21T16:25:18Z] <claime> Uncordoning kubernetes2025.codfw.wmnet kubernetes2026.codfw.wmnet following codfw A8 network migration - T355874

All hosts moved without issue, thanks Jenn!

cmooney claimed this task.

Closing this, thanks all for the help!