Page MenuHomePhabricator

Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Feb 15 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-a6-codfw intlsw1-a6-codfw int
Data Engineeringaqs2001ge-6/0/36aqs2001
Data Engineeringaqs2002ge-6/0/37aqs2002
Data Engineeringaqs2003ge-6/0/38aqs2003
Data Engineeringaqs2004ge-6/0/39aqs2004
Data persistencedb2155ge-6/0/0move
Data persistencedb2156ge-6/0/1move
Data persistencedb2097ge-6/0/6move
Data persistencedb2105ge-6/0/7move
Data persistencedb2122ge-6/0/9move
Data persistencedb2133ge-6/0/10move
Data persistencedbproxy2001ge-6/0/8dbproxy2001
Data Persistencees2024ge-6/0/12es2024
Data Persistencees2027ge-6/0/24es2027
Data Persistencees2028ge-6/0/25es2028
Machine Learningml-staging2001ge-6/0/34drain on deployment server
Service Opskubernetes2059ge-6/0/2drain
Service Opskubernetes2028ge-6/0/3drain
Service Opskubernetes2027ge-6/0/4drain
Service Opskubernetes2060ge-6/0/5drain
Service Opskubernetes2008ge-6/0/13drain
Service Opskubernetes2007ge-6/0/23drain
Service Opskubernetes2055ge-6/0/33drain
Service Opsmw2301ge-6/0/11depool
Service Opsmw2302ge-6/0/14depool
Service Opsmw2303ge-6/0/15depool
Service Opsmw2304ge-6/0/16depool
Service Opsmw2305ge-6/0/17depool
Service Opsmw2306ge-6/0/18depool
Service Opsmw2307ge-6/0/20depool
Service Opsmw2308ge-6/0/21depool
Service Opsmw2309ge-6/0/22depool
Service Opsmw2424ge-6/0/40depool
Service Opsmw2425ge-6/0/41depool
Service Opsmw2426ge-6/0/42depool
Service Opsmw2427ge-6/0/43depool

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Related Objects

Event Timeline

cmooney triaged this task as Medium priority.Jan 25 2024, 11:43 AM
cmooney created this task.

db2155 - slave
db2156 - slave
db2097 - backups slave @jcrespo
db2105 - s3 master
db2122 - slave
db2133 - m2 master (not used)
dbproxy2001 - not used
es2024 - es5 master
es2027 - standalone
es2028 - standalone

db2105 is no longer a master. This host can be done after being depooled

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:14:29Z] <claime> Draining kubernetes2059.codfw.wmnet kubernetes2028.codfw.wmnet kubernetes2027.codfw.wmnet kubernetes2060.codfw.wmnet kubernetes2008.codfw.wmnet kubernetes2007.codfw.wmnet kubernetes2055.codfw.wmnet mw2301.codfw.wmnet mw2424.codfw.wmnet mw2425.codfw.wmnet mw2427.codfw.wmnet - T355866

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:15:11Z] <claime> Depooling mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426 - T355866

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:20Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T355866 - db2155 db2156 db2105 db2122 db2133 es2024', diff saved to https://phabricator.wikimedia.org/P56837 and previous config saved to /var/cache/conftool/dbconfig/20240215-154520-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:26Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2155.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:37Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:41Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2156.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:52Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2156.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:45:56Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:01Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:05Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2122.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:12Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2122.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:16Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2133.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:39Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2133.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:43Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on es2024.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:46:54Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2024.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Icinga downtime and Alertmanager silence (ID=dc8a2b8d-561d-404c-ac7f-f64637c16dd1) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack a6

asw-a-codfw,cr[1-2]-codfw,lsw1-a6-codfw.mgmt

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:53:57Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on es2027.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:54:10Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on es2027.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:54:18Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on es2028.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-15T15:54:32Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on es2028.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw

Icinga downtime and Alertmanager silence (ID=23a82a8c-672f-4105-8a05-0b7dbbb4cb97) set by cmooney@cumin1002 for 0:30:00 on 38 host(s) and their services with reason: Migrating servers in codfw rack A6 to lsw1-a6-codfw

aqs[2001-2004].codfw.wmnet,db[2097,2105,2122,2133,2136,2155-2156].codfw.wmnet,dbproxy2001.codfw.wmnet,es[2024,2027-2028].codfw.wmnet,gitlab2002.wikimedia.org,kubernetes[2007-2008,2027-2028,2055,2059-2060].codfw.wmnet,kubestage2001.codfw.wmnet,ml-staging2001.codfw.wmnet,mw[2301-2309,2424-2427].codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:00:54Z] <topranks> commencing move of server uplinks codfw row A6 T355866

All moves now complete, ports up on new switch and all devices pinging ok!

amazing, thanks @cmooney! will start repooling

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:13:14Z] <claime> Uncordoning kubernetes2059.codfw.wmnet kubernetes2028.codfw.wmnet kubernetes2027.codfw.wmnet kubernetes2060.codfw.wmnet kubernetes2008.codfw.wmnet kubernetes2007.codfw.wmnet kubernetes2055.codfw.wmnet mw2301.codfw.wmnet mw2424.codfw.wmnet mw2425.codfw.wmnet mw2427.codfw.wmnet - T355866

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:13:31Z] <claime> Repooling mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426 - T355866

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:13:39Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56838 and previous config saved to /var/cache/conftool/dbconfig/20240215-161338-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:28:44Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56839 and previous config saved to /var/cache/conftool/dbconfig/20240215-162843-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:43:49Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56840 and previous config saved to /var/cache/conftool/dbconfig/20240215-164348-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:58:54Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56841 and previous config saved to /var/cache/conftool/dbconfig/20240215-165853-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:58:59Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56842 and previous config saved to /var/cache/conftool/dbconfig/20240215-165858-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:14:04Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56843 and previous config saved to /var/cache/conftool/dbconfig/20240215-171403-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:29:09Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56844 and previous config saved to /var/cache/conftool/dbconfig/20240215-172909-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:44:14Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56846 and previous config saved to /var/cache/conftool/dbconfig/20240215-174414-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:44:20Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56847 and previous config saved to /var/cache/conftool/dbconfig/20240215-174419-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:59:25Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56848 and previous config saved to /var/cache/conftool/dbconfig/20240215-175924-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T18:14:30Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56849 and previous config saved to /var/cache/conftool/dbconfig/20240215-181429-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T18:29:35Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56852 and previous config saved to /var/cache/conftool/dbconfig/20240215-182934-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T18:29:41Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56853 and previous config saved to /var/cache/conftool/dbconfig/20240215-182939-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T18:44:45Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56856 and previous config saved to /var/cache/conftool/dbconfig/20240215-184444-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T18:59:50Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56858 and previous config saved to /var/cache/conftool/dbconfig/20240215-185949-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T19:14:55Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56861 and previous config saved to /var/cache/conftool/dbconfig/20240215-191454-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T19:15:01Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56862 and previous config saved to /var/cache/conftool/dbconfig/20240215-191500-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T19:30:06Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56863 and previous config saved to /var/cache/conftool/dbconfig/20240215-193005-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T19:45:11Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56865 and previous config saved to /var/cache/conftool/dbconfig/20240215-194510-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T20:00:16Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56867 and previous config saved to /var/cache/conftool/dbconfig/20240215-200015-arnaudb.json

@cmooney is there anything pending here or can this be closed?

cmooney claimed this task.

@cmooney is there anything pending here or can this be closed?

@Marostegui was an oversight on my part thanks for the heads up.