Page MenuHomePhabricator

Migrate servers in codfw racks D3 & D4 from asw to lsw
Closed, ResolvedPublic

Description

Currently scheduled for Tues Sept 17th 2024 16:00 UTC

As part of the scheduled refresh of switch equipment in codfw rows C and D we need to move the network connections for servers in racks D3 and D4 from the old to new switch.

Hosts in this rack are managed by the following teams:

Core Platform
Data Persistence
Infrastructure Foundations
Observability
Search Platform
ServiceOps
Traffic
WMCS

A full list of the specific hosts can be found below. We will use the sheet to plan the moves and co-ordinate with other SRE teams on actions required to ensure things go smoothly:

https://docs.google.com/spreadsheets/d/16xoZuDeC_-o6s70uEMnvdgn4BlT1f8__WPYprRuduIA#gid=1233730936

Server links will be moved one-by-one from old to the new switch. So no two hosts will be offline at once.

Based on previous experience each host is likely to only lose comms for ~10 seconds. It is inevitable that a small number of the new cables do not work, however, or there is some minor glitch in the move. So it is possible in an edge case that a host will be offline for 2-3 minutes. On previous occasions this happened with about 1 out of 20 hosts.

Related Objects

Event Timeline

cmooney triaged this task as Medium priority.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

No Swift/Ceph nodes affected in this one.

all hosts are depoolable for this task

Traffic hosts cp2039 and cp2040 are depooled and ready.

Mentioned in SAL (#wikimedia-operations) [2024-09-17T15:28:33Z] <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: name=(cp2039|cp2040).codfw.wmnet [reason: depool for T373103]

Mentioned in SAL (#wikimedia-operations) [2024-09-17T15:43:56Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'depool db2213 db2214 es2023 pc2016 db2209 - T373103', diff saved to https://phabricator.wikimedia.org/P69221 and previous config saved to /var/cache/conftool/dbconfig/20240917-154355-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T15:44:16Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:45:00 on 6 hosts with reason: network maintenance T373103

Mentioned in SAL (#wikimedia-operations) [2024-09-17T15:44:29Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on 6 hosts with reason: network maintenance T373103

All hosts moved successfully and responding to ping again.

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:07:45Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: T373103', diff saved to https://phabricator.wikimedia.org/P69222 and previous config saved to /var/cache/conftool/dbconfig/20240917-170745-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:07:50Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: T373103', diff saved to https://phabricator.wikimedia.org/P69223 and previous config saved to /var/cache/conftool/dbconfig/20240917-170749-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:07:58Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: T373103', diff saved to https://phabricator.wikimedia.org/P69224 and previous config saved to /var/cache/conftool/dbconfig/20240917-170755-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:08:07Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2209 (re)pooling @ 25%: T373103', diff saved to https://phabricator.wikimedia.org/P69225 and previous config saved to /var/cache/conftool/dbconfig/20240917-170805-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:10:33Z] <sukhe@puppetmaster1001> conftool action : set/pooled=yes; selector: name=(cp2039|cp2040).codfw.wmnet [reason: [maint done] depool for T373103]

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:22:51Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: T373103', diff saved to https://phabricator.wikimedia.org/P69226 and previous config saved to /var/cache/conftool/dbconfig/20240917-172250-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:22:56Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: T373103', diff saved to https://phabricator.wikimedia.org/P69227 and previous config saved to /var/cache/conftool/dbconfig/20240917-172255-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:23:01Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: T373103', diff saved to https://phabricator.wikimedia.org/P69228 and previous config saved to /var/cache/conftool/dbconfig/20240917-172300-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:23:11Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: T373103', diff saved to https://phabricator.wikimedia.org/P69229 and previous config saved to /var/cache/conftool/dbconfig/20240917-172310-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:37:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: T373103', diff saved to https://phabricator.wikimedia.org/P69230 and previous config saved to /var/cache/conftool/dbconfig/20240917-173756-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:38:02Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: T373103', diff saved to https://phabricator.wikimedia.org/P69231 and previous config saved to /var/cache/conftool/dbconfig/20240917-173801-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:38:06Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: T373103', diff saved to https://phabricator.wikimedia.org/P69232 and previous config saved to /var/cache/conftool/dbconfig/20240917-173806-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:38:16Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: T373103', diff saved to https://phabricator.wikimedia.org/P69233 and previous config saved to /var/cache/conftool/dbconfig/20240917-173816-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:53:02Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: T373103', diff saved to https://phabricator.wikimedia.org/P69234 and previous config saved to /var/cache/conftool/dbconfig/20240917-175302-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:53:07Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: T373103', diff saved to https://phabricator.wikimedia.org/P69235 and previous config saved to /var/cache/conftool/dbconfig/20240917-175306-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:53:12Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: T373103', diff saved to https://phabricator.wikimedia.org/P69236 and previous config saved to /var/cache/conftool/dbconfig/20240917-175311-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:53:22Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: T373103', diff saved to https://phabricator.wikimedia.org/P69237 and previous config saved to /var/cache/conftool/dbconfig/20240917-175321-arnaudb.json

cmooney claimed this task.