Page MenuHomePhabricator

Eqiad: row C/D switch refresh configuration task
Closed, ResolvedPublic

Description

High level task to track the progress towards getting the new Nokia switches in Eqiad rows C and D configured and ready for us to connect servers / begin the migration from the existing switches.

This in a large part depends on the below two tasks to get us in a position where Homer can configure the Nokias and we have the automation updated to produce the required configuration to support the network functions we need.

T402511: Nokia: Support Python config generation and JSON-RPC transport in Homer

T402577: Homer: Add Python modules to configure Nokia SR Linux switches

Event Timeline

cmooney triaged this task as Medium priority.

Change #1180953 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add new Nokia switches to IBGP spine/leaf pod definitions in sites

https://gerrit.wikimedia.org/r/1180953

Change #1180953 merged by jenkins-bot:

[operations/homer/public@master] Add new IBGP cluster in eqiad with pod for row C/D Nokia switches

https://gerrit.wikimedia.org/r/1180953

Change #1187081 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Include statements for new netbox-generated snippet files

https://gerrit.wikimedia.org/r/1187081

Change #1187081 merged by Cathal Mooney:

[operations/dns@master] Include statements for new netbox-generated snippet files

https://gerrit.wikimedia.org/r/1187081

Mentioned in SAL (#wikimedia-operations) [2025-10-02T14:17:16Z] <topranks> drain transport circuit cr1-eqiad <-> cr1-codfw to allow for PIC card reboot on cr1-eqiad T402588

Mentioned in SAL (#wikimedia-operations) [2025-10-02T14:28:38Z] <topranks> drain link from cr1-eqiad <-> ssw1-e1-eqiad to allow PIC card reboot on cr1-eqiad T402588

Icinga downtime and Alertmanager silence (ID=626cec35-f6f7-443b-90fb-3024162d9dc9) set by cmooney@cumin1003 for 0:10:00 on 3 host(s) and their services with reason: reset PIC 0/1 in cr1-eqiad to set port 5 speed

cr[1-2]-eqiad,ssw1-e1-eqiad

Mentioned in SAL (#wikimedia-operations) [2025-10-02T14:36:50Z] <topranks> reset PIC 0/1 on cr1-eqiad to set port speed for port 5 T402588

Change #1193146 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] cr1-eqiad: add BGP to ssw1-d1-eqiad spine

https://gerrit.wikimedia.org/r/1193146

Change #1193146 merged by jenkins-bot:

[operations/homer/public@master] cr1-eqiad: add BGP to ssw1-d1-eqiad spine

https://gerrit.wikimedia.org/r/1193146

Mentioned in SAL (#wikimedia-operations) [2025-10-03T10:14:41Z] <topranks> drain transport circuits on PIC 1/0 of cr2-eqiad to allow for card reboot T402588

Mentioned in SAL (#wikimedia-operations) [2025-10-03T10:21:06Z] <topranks> drain traffic from cr2-codfw <-> ssw1-f1-codfw link to allow for cr2-codfw card reset T402588

Icinga downtime and Alertmanager silence (ID=da4ec6fc-e51f-4967-b0f1-8ef51813239b) set by cmooney@cumin1003 for 0:10:00 on 5 host(s) and their services with reason: reset PIC 0/1 in cr2 to set port 5 speed

cr[1-2]-eqiad,cr2-eqord,cr1-magru,ssw1-f1-eqiad

Mentioned in SAL (#wikimedia-operations) [2025-10-03T10:27:42Z] <topranks> reset PIC 1/0 on cr2-eqiad to configure port 5 speed T402588

Change #1194215 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] ssw1-e1-eqiad: Add EBGP peering to ssw1-d1-eqiad

https://gerrit.wikimedia.org/r/1194215

Change #1194215 merged by jenkins-bot:

[operations/homer/public@master] ssw1-e1-eqiad: Add EBGP peering to ssw1-d1-eqiad

https://gerrit.wikimedia.org/r/1194215