Page MenuHomePhabricator

Migrate servers in codfw rack B7 from asw-b7-codfw to lsw1-b7-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Fed 29 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-b7-codfw intlsw1-b7-codfw int
Data Persistencems-be2047xe-7/0/0ms-be2047
Data Persistencethanos-be2002xe-7/0/3thanos-be2002
Infra Foundationsganeti2032xe-7/0/5hosts
Observabilitylogstash2036xe-7/0/7logstash2036
Search Platformelastic2044xe-7/0/2elastic2044
Search Platformelastic2043xe-7/0/31elastic2043
Search Platformelastic2079xe-7/0/33elastic2079
Search Platformelastic2080xe-7/0/34elastic2080
Service Opsmc2046xe-7/0/32mc2046

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Event Timeline

cmooney triaged this task as Medium priority.Jan 25 2024, 11:55 AM
cmooney created this task.
MatthewVernon subscribed.

I'll want to check the backends once this work is complete, but it shouldn't be an issue.

Mentioned in SAL (#wikimedia-operations) [2024-02-28T22:13:57Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2043*,2044*,2079*,2080* for switch maintenance - bking@cumin2002 - T355872

Mentioned in SAL (#wikimedia-operations) [2024-02-28T22:14:03Z] <bking@cumin2002> END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2043*,2044*,2079*,2080* for switch maintenance - bking@cumin2002 - T355872

Draining ganeti2032.codfw.wmnet of running VMs

Mentioned in SAL (#wikimedia-operations) [2024-02-29T15:59:45Z] <topranks> configuring lsw1-b7-codfw in advance of server migration T355872

Icinga downtime and Alertmanager silence (ID=ab1a9b14-3187-4d52-a4f6-be3c445a8081) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack b7

asw-b-codfw,cr[1-2]-codfw,lsw1-b7-codfw

Icinga downtime and Alertmanager silence (ID=12cd3c2a-9d8e-4ba6-a42e-1faa167de80d) set by cmooney@cumin1002 for 0:30:00 on 9 host(s) and their services with reason: Migrating servers in codfw rack B7 to lsw1-b7-codfw

elastic[2043-2044,2079-2080].codfw.wmnet,ganeti[2032-2033].codfw.wmnet,logstash2036.codfw.wmnet,mc2046.codfw.wmnet,thanos-be2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-02-29T16:05:22Z] <topranks> Commencing network maintenance migrating servers to new switch codfw rack B7 T355872

All hosts moved sucessfully. Showing up on switch, macs learnt and all responding to ping again.

thanos and ms swift clusters OK post-move, thank you!

cmooney claimed this task.

Closing task - thanks all for the assistance!