Page MenuHomePhabricator

Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Feb 20th 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-a7-codfw intlsw1-a7-codfw int
Data Engineeringcephosd2001xe-7/0/8cephosd2001
Data Persistencems-be2045xe-7/0/0ms-be2045
Data Persistencems-be2052xe-7/0/9ms-be2052
Data Persistencethanos-be2001xe-7/0/3thanos-be2001
Infra Foundationsganeti2028xe-7/0/20hosts
Infra Foundationsganeti-test2001xe-7/0/5needed
Infra Foundationsganeti-test2002xe-7/0/6needed
Infra Foundationsganeti-test2003xe-7/0/7needed
Search Platformelastic2039xe-7/0/1elastic2039
Search Platformelastic2040xe-7/0/2elastic2040
Search Platformelastic2056xe-7/0/10elastic2056
Search Platformelastic2069xe-7/0/27elastic2069
Search Platformelastic2075xe-7/0/30elastic2075
Search Platformelastic2076xe-7/0/31elastic2076
Search Platformelastic2090xe-7/0/33elastic2090
Search Platformelastic2091xe-7/0/34elastic2091
Search Platformwdqs2009xe-7/0/26wdqs2009
Search Platformwdqs2020xe-7/0/32wdqs2020
Service Opsmc2040xe-7/0/28mc2040
Service Opsmc2041xe-7/0/29mc2041
Trafficcp2029xe-7/0/11cp2029
Trafficcp2030xe-7/0/12cp2030
WMCScloudbackup2001xe-7/0/4cloudbackup2001

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Event Timeline

cmooney triaged this task as Medium priority.Jan 25 2024, 11:45 AM
cmooney created this task.
MatthewVernon subscribed.

Once complete I'll want to check the backends, but this shouldn't be an issue.

There's no need to coordinate with us for cloudbackup2001, it might cause us to get a transient alert but that service isn't the most stable anyway :)

Once complete I'll want to check the backends, but this shouldn't be an issue.

There's no need to coordinate with us for cloudbackup2001, it might cause us to get a transient alert but that service isn't the most stable anyway :)

Noted, thanks.

Draining ganeti2028.codfw.wmnet of running VMs

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:49:13Z] <brett@cumin2002> START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2029-2030].codfw.wmnet with reason: T355867

Mentioned in SAL (#wikimedia-operations) [2024-02-20T14:49:30Z] <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2029-2030].codfw.wmnet with reason: T355867

Mentioned in SAL (#wikimedia-operations) [2024-02-20T15:16:40Z] <dcausse> depooled wdqs2009 & wdqs2020 (T355867)

Icinga downtime and Alertmanager silence (ID=343ed6db-68dd-4330-8851-9631da7da8d5) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack a7

asw-a-codfw,cr[1-2]-codfw,lsw1-a7-codfw.mgmt

Icinga downtime and Alertmanager silence (ID=47f3a57d-6476-4782-ba82-9c2dc99042c9) set by cmooney@cumin1002 for 0:30:00 on 22 host(s) and their services with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw

cephosd2001.codfw.wmnet,cloudbackup2001.codfw.wmnet,cp[2029-2030].codfw.wmnet,elastic[2039-2040,2056,2069,2075-2076,2090-2091].codfw.wmnet,ganeti2028.codfw.wmnet,ganeti-test[2001-2003].codfw.wmnet,mc[2040-2041].codfw.wmnet,ms-be2052.codfw.wmnet,thanos-be2001.codfw.wmnet,wdqs[2009,2020].codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-02-20T16:07:32Z] <topranks> Commencing network maintenance migrating servers to new switch codfw rack A7 T355867

All links moved successfully and all hosts responding to ping as before.

ms and thanos swift both OK post-move.

cmooney claimed this task.

Thanks all for the help on this one!