Page MenuHomePhabricator

Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Feb 27 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-b3-codfw intlsw1-b3-codfw int
Core Platform / Data Persistencerestbase2028ge-3/0/5disruption
Core Platform / Data Persistencerestbase2021ge-3/0/21disruption
Data persistencedb2108ge-3/0/24move
Data persistencedb2123ge-3/0/25move
Data Persistencees2021ge-3/0/20es2021
Service Opskubernetes2030ge-3/0/3drain
Service Opskubernetes2029ge-3/0/4drain
Service Opskubernetes2057ge-3/0/6drain
Service Opsconf2004ge-3/0/2conf2004
Service Opsmw2324ge-3/0/0depool
Service Opsmw2323ge-3/0/1depool
Service Opsmw2259ge-3/0/8depool
Service Opsmw2260ge-3/0/9depool
Service Opsmw2261ge-3/0/10depool
Service Opsmw2262ge-3/0/11depool
Service Opsmw2263ge-3/0/12depool
Service Opsmw2264ge-3/0/13depool
Service Opsmw2265ge-3/0/14depool
Service Opsmw2266ge-3/0/15depool
Service Opsmw2267ge-3/0/16depool
Service Opsmw2268ge-3/0/17depool
Service Opsmw2269ge-3/0/18depool
Service Opsmw2270ge-3/0/19depool
Service Opsmw2310ge-3/0/26depool
Service Opsmw2311ge-3/0/27depool
Service Opsmw2312ge-3/0/28depool
Service Opsmw2313ge-3/0/29depool
Service Opsmw2314ge-3/0/30depool
Service Opsmw2315ge-3/0/31depool
Service Opsmw2316ge-3/0/32depool
Service Opsmw2317ge-3/0/33depool
Service Opsmw2318ge-3/0/34depool
Service Opsmw2319ge-3/0/35depool
Service Opsmw2320ge-3/0/36depool
Service Opsmw2321ge-3/0/37depool
Service Opsmw2322ge-3/0/38depool

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Event Timeline

cmooney triaged this task as Medium priority.Jan 25 2024, 11:53 AM
cmooney created this task.

db2108 - slave
db2123 - slave
es2021 - es4 master

Once T355862 is done, es2021 needs to be switched back to be es4 slave (reverting all this T356064).
This is tracked at T356372

Change 998431 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] codfw lvs::balancer: Switch config_host to conf2006

https://gerrit.wikimedia.org/r/998431

Mentioned in SAL (#wikimedia-operations) [2024-02-08T15:48:29Z] <claime> Draining mw2377.codfw.wmnet mw2378.codfw.wmnet mw2381.codfw.wmnet mw2395.codfw.wmnet mw2291.codfw.wmnet mw2292.codfw.wmnet mw2293.codfw.wmnet mw2294.codfw.wmnet mw2295.codfw.wmnet mw2296.codfw.wmnet mw2297.codfw.wmnet - T355870

es2021 is no longer a master and it just need normal depooling cc @ABran-WMF

Change 998431 merged by Fabfur:

[operations/puppet@production] codfw lvs::balancer: Switch config_host to conf2006

https://gerrit.wikimedia.org/r/998431

Mentioned in SAL (#wikimedia-operations) [2024-02-27T14:47:05Z] <claime> Depooling mw2324.codfw.wmnet,mw2323.codfw.wmnet,mw2259.codfw.wmnet,mw2261.codfw.wmnet,mw2262.codfw.wmnet,mw2263.codfw.wmnet,mw2264.codfw.wmnet,mw2265.codfw.wmnet,mw2266.codfw.wmnet,mw2268.codfw.wmnet,mw2269.codfw.wmnet,mw2270.codfw.wmnet,mw2314.codfw.wmnet,mw2315.codfw.wmnet,mw2316.codfw.wmnet,mw2320.codfw.wmnet,mw2321.codfw.wmnet,mw2322.codfw.wmnet for T355870

Mentioned in SAL (#wikimedia-operations) [2024-02-27T14:52:44Z] <claime> Drainining mw2260.codfw.wmnet mw2267.codfw.wmnet mw2310.codfw.wmnet mw2311.codfw.wmnet mw2312.codfw.wmnet mw2313.codfw.wmnet mw2317.codfw.wmnet mw2318.codfw.wmnet mw2319.codfw.wmnet kubernetes2030.codfw.wmnet kubernetes2029.codfw.wmnet kubernetes2057.codfw.wmnet for T355870

Mentioned in SAL (#wikimedia-operations) [2024-02-27T15:39:26Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:40:00 on db[2108,2123].codfw.wmnet,es2021.codfw.wmnet with reason: Silence for network maintenance T355870

Mentioned in SAL (#wikimedia-operations) [2024-02-27T15:39:42Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db[2108,2123].codfw.wmnet,es2021.codfw.wmnet with reason: Silence for network maintenance T355870

Mentioned in SAL (#wikimedia-operations) [2024-02-27T15:39:51Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T355870 - depooling es2021 db2108 db2123', diff saved to https://phabricator.wikimedia.org/P57999 and previous config saved to /var/cache/conftool/dbconfig/20240227-153951-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T15:41:26Z] <topranks> configuring lsw1-b3-codfw in advance of server migration T355870

Icinga downtime and Alertmanager silence (ID=94e7352f-26c7-48ff-b2c5-61b1faed7b5a) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack b3

asw-b-codfw,cr[1-2]-codfw,lsw1-b3-codfw.mgmt

Icinga downtime and Alertmanager silence (ID=4a16f229-e545-4883-81ab-3b2ddd2d7636) set by cmooney@cumin1002 for 0:30:00 on 36 host(s) and their services with reason: Migrating servers in codfw rack B3 to lsw1-b3-codfw

conf2004.codfw.wmnet,db[2108,2123].codfw.wmnet,es2021.codfw.wmnet,kubernetes[2029-2030,2057].codfw.wmnet,mw[2259-2270,2310-2324].codfw.wmnet,restbase[2021,2028].codfw.wmnet

All moves complete, everything looking good and back responding to ping :)

Mentioned in SAL (#wikimedia-operations) [2024-02-27T16:49:43Z] <claime> Uncordoning mw2260.codfw.wmnet mw2267.codfw.wmnet mw2310.codfw.wmnet mw2311.codfw.wmnet mw2312.codfw.wmnet mw2313.codfw.wmnet mw2317.codfw.wmnet mw2318.codfw.wmnet mw2319.codfw.wmnet kubernetes2030.codfw.wmnet kubernetes2029.codfw.wmnet kubernetes2057.codfw.wmnet for T355870

Mentioned in SAL (#wikimedia-operations) [2024-02-27T16:51:35Z] <claime> Repooling mw2324.codfw.wmnet,mw2323.codfw.wmnet,mw2259.codfw.wmnet,mw2261.codfw.wmnet,mw2262.codfw.wmnet,mw2263.codfw.wmnet,mw2264.codfw.wmnet,mw2265.codfw.wmnet,mw2266.codfw.wmnet,mw2268.codfw.wmnet,mw2269.codfw.wmnet,mw2270.codfw.wmnet,mw2314.codfw.wmnet,mw2315.codfw.wmnet,mw2316.codfw.wmnet,mw2320.codfw.wmnet,mw2321.codfw.wmnet,mw2322.codfw.wmnet for T355870

cmooney claimed this task.