Page MenuHomePhabricator

Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
Closed, ResolvedPublic

Description

Currently scheduled for Feb 14 16:00 UTC

The following server uplink moves need to be completed as part of the wider migration from our old top-of-rack switches in codfw to their new replacements. The work is just to move the cable, so we expect an interruption of 60 seconds or less per hosts. Moves will be sequential, so only 1 host will be disconnected at any given moment.

TeamHost typeasw-a5-codfw intlsw1-a5-codfw int
Core Platform?maps2005ge-5/0/10advance
Data persistencedb2121ge-5/0/0move
Data persistencedb2132ge-5/0/1move
Data persistencedb2145ge-5/0/13move
Data persistencedb2104ge-5/0/14move
Data persistencedb2153ge-5/0/34move
Data persistencedb2154ge-5/0/41move
Data persistencedb2175ge-5/0/42move
Data persistencedb2176ge-5/0/43move
Data Persistencepc2011ge-5/0/24move
Infra Foundationsganeti2023ge-5/0/5hosts
Infra Foundationsganeti2024ge-5/0/7hosts
Infra Foundationspuppetmaster2001ge-5/0/26Depool
Infra Foundationspuppetserver2002ge-5/0/9Depool
Machine Learningml-serve2001ge-5/0/12simple downtime (klausman will do it)
Observabilitylogstash2001ge-5/0/18logstash2001
Service Opskubernetes2019ge-5/0/6drain
Service Opskubernetes2018ge-5/0/25drain
Service Opsmw2420ge-5/0/11depool
Service Opsmw2402ge-5/0/15depool
Service Opsmw2403ge-5/0/16depool
Service Opsmw2421ge-5/0/17depool
Service Opsmw2404ge-5/0/19depool
Service Opsmw2405ge-5/0/20depool
Service Opsmw2406ge-5/0/21depool
Service Opsmw2407ge-5/0/22depool
Service Opsmw2408ge-5/0/27depool
Service Opsmw2409ge-5/0/32depool
Service Opsmw2401ge-5/0/35depool
Service Opsmw2422ge-5/0/36depool
Service Opsmw2410ge-5/0/37depool
Service Opsmw2411ge-5/0/39depool
Service Opsmw2423ge-5/0/40depool
Service Opsparse2001ge-5/0/2depool
Service Opsparse2002ge-5/0/3depool
Service Opsparse2003ge-5/0/4depool
Service Opsrdb2007ge-5/0/8rdb2007

We can track the details of the moves and what needs to be done to prepare in the Google sheet here, if not specific action is needed for a given type of host just state that on the first tab

https://docs.google.com/spreadsheets/d/1PlGGLclKFYR9XaqjOLibhiwwny0fOD8gLMwsNhIzGRo

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rack is physically ready for tomorrow.

Draining ganeti2023.codfw.wmnet of running VMs

Draining ganeti2024.codfw.wmnet of running VMs

Mentioned in SAL (#wikimedia-operations) [2024-02-14T14:15:06Z] <claime> Draining and cordoning kubernetes2019.codfw.wmnet kubernetes2018.codfw.wmnet mw2420.codfw.wmnet mw2421.codfw.wmnet mw2406.codfw.wmnet mw2422.codfw.wmnet mw2423.codfw.wmnet for T355864

Mentioned in SAL (#wikimedia-operations) [2024-02-14T14:34:51Z] <claime> Depooling mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003 for T355864

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:45:10Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2121.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:45:23Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2121.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:45:29Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2132.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:45:38Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2132.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:45:42Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2145.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:05Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:09Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2104.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:20Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2104.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:24Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2153.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:35Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:39Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2154.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:51Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2154.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:46:55Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2175.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:47:06Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:47:10Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on db2176.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:47:35Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:47:53Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T355864 - Depool db2121 db2132 db2145 db2104 db2153 db2154 db2175 db2176', diff saved to https://phabricator.wikimedia.org/P56778 and previous config saved to /var/cache/conftool/dbconfig/20240214-154753-arnaudb.json

Icinga downtime and Alertmanager silence (ID=9a43620e-deca-432c-aa1f-5d6e939b51bc) set by cmooney@cumin1002 for 1:00:00 on 4 host(s) and their services with reason: prepping for server uplink migration codfw rack a5

asw-a-codfw,cr[1-2]-codfw,lsw1-a5-codfw.mgmt

Mentioned in SAL (#wikimedia-operations) [2024-02-14T15:59:38Z] <topranks> disable puppet fleet-wide to allow for distruption to puppetmaster/puppetserver during network maint T355864

Icinga downtime and Alertmanager silence (ID=ec1ab967-b8f5-4bfd-914e-e76afe369468) set by cmooney@cumin1002 for 0:30:00 on 38 host(s) and their services with reason: Migrating servers in codfw rack A5 to lsw1-a5-codfw

db[2104,2121,2132,2145,2153-2154,2157,2175-2176].codfw.wmnet,ganeti[2023-2024].codfw.wmnet,kubernetes[2018-2019].codfw.wmnet,logstash2001.codfw.wmnet,maps2005.codfw.wmnet,ml-serve2001.codfw.wmnet,mw[2401-2411,2420-2423].codfw.wmnet,parse[2001-2003].codfw.wmnet,pc2011.codfw.wmnet,puppetmaster2001.codfw.wmnet,puppetserver2002.codfw.wmnet,rdb2007.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:07:53Z] <topranks> Moving server uplinks from old switch to new codfw rack A5 T355864

All links moved and all devices pinging ok again.

awesome, will start repooling, thanks @cmooney

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:16:14Z] <claime> Uncordoning kubernetes2019.codfw.wmnet kubernetes2018.codfw.wmnet mw2420.codfw.wmnet mw2421.codfw.wmnet mw2406.codfw.wmnet mw2422.codfw.wmnet mw2423.codfw.wmnet for T355864

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:16:50Z] <claime> Repooling mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003 for T355864

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:18:25Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56779 and previous config saved to /var/cache/conftool/dbconfig/20240214-161824-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:33:30Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56780 and previous config saved to /var/cache/conftool/dbconfig/20240214-163330-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T16:48:35Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56781 and previous config saved to /var/cache/conftool/dbconfig/20240214-164834-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:03:40Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56782 and previous config saved to /var/cache/conftool/dbconfig/20240214-170339-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:03:56Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56783 and previous config saved to /var/cache/conftool/dbconfig/20240214-170345-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:18:50Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56784 and previous config saved to /var/cache/conftool/dbconfig/20240214-171850-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:33:56Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56785 and previous config saved to /var/cache/conftool/dbconfig/20240214-173355-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:49:01Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56786 and previous config saved to /var/cache/conftool/dbconfig/20240214-174900-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T17:49:06Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56787 and previous config saved to /var/cache/conftool/dbconfig/20240214-174906-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T18:04:11Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56788 and previous config saved to /var/cache/conftool/dbconfig/20240214-180411-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T18:19:16Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56790 and previous config saved to /var/cache/conftool/dbconfig/20240214-181916-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T18:34:21Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56792 and previous config saved to /var/cache/conftool/dbconfig/20240214-183421-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T18:34:27Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2153 (re)pooling @ 25%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56793 and previous config saved to /var/cache/conftool/dbconfig/20240214-183426-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T18:49:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2153 (re)pooling @ 50%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56795 and previous config saved to /var/cache/conftool/dbconfig/20240214-184931-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:04:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2153 (re)pooling @ 75%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56798 and previous config saved to /var/cache/conftool/dbconfig/20240214-190436-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:19:42Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2153 (re)pooling @ 100%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56799 and previous config saved to /var/cache/conftool/dbconfig/20240214-191941-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:19:47Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56800 and previous config saved to /var/cache/conftool/dbconfig/20240214-191946-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:34:53Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56801 and previous config saved to /var/cache/conftool/dbconfig/20240214-193451-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:49:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56802 and previous config saved to /var/cache/conftool/dbconfig/20240214-194956-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:05:53Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56803 and previous config saved to /var/cache/conftool/dbconfig/20240214-200501-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:06:11Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56804 and previous config saved to /var/cache/conftool/dbconfig/20240214-200507-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:20:12Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56805 and previous config saved to /var/cache/conftool/dbconfig/20240214-202012-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:35:17Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56806 and previous config saved to /var/cache/conftool/dbconfig/20240214-203517-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:50:22Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56807 and previous config saved to /var/cache/conftool/dbconfig/20240214-205021-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:50:27Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56808 and previous config saved to /var/cache/conftool/dbconfig/20240214-205027-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:05:34Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56810 and previous config saved to /var/cache/conftool/dbconfig/20240214-210531-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:20:40Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56812 and previous config saved to /var/cache/conftool/dbconfig/20240214-212038-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:35:45Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56814 and previous config saved to /var/cache/conftool/dbconfig/20240214-213544-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-02-15T08:50:31Z] <moritzm> rebalance Ganeti codfw/A now that the switch maintenance for A5 and A6 are completed T355864 T355863

cmooney claimed this task.

All looking good, closing task. Thanks everyone for their assistance.