Page MenuHomePhabricator

Decommission E/F 8 Dell switches
Closed, ResolvedPublic

Description

Pulling the plug on the two Dell SONiC switches we tested in rack E8 and F8:
https://netbox.wikimedia.org/dcim/devices/4650/
https://netbox.wikimedia.org/dcim/devices/4651/

As well as the test server https://netbox.wikimedia.org/dcim/devices/275/ (former cloudvirt transformed into a sretest in T349168: Add test server to rack E8).

WARNING: setting the device to "decom" in Netbox causes the DNS automation to remove the vlans defined on them, which require a dns git repo patch.

Either we leave it as it and transition the vlans to the future switches, or decom then re-provision those vlans.

for the switches:

  • Disable interfaces [netops]
  • Remove prod IP config [netops]
  • Disconnect all the links [dcops]
  • Unrack switches [dcops]
  • Remove mgmt IP/dns [netops]

Possibly replaced with {T380017}

Details

Related Changes in Gerrit:

Event Timeline

cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: sretest1004.eqiad.wmnet

  • sretest1004.eqiad.wmnet (FAIL)
    • Host not found on Icinga, unable to downtime it
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Host steps raised exception: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Change #1091711 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/dns@master] Remove v6 include for e8/f8 uplinks

https://gerrit.wikimedia.org/r/1091711

I've tidied up netbox for these now.

I left the ports enabled on the ssw side with the IPs present, as we can't disable them there and keep the IPs attached. Easier to do that and re-use them for the replacement Junipers than to patch the dns repo to remove the /64 reverse include statements. I've also kept the gateway IPs in netbox, renamed to "irb-XXX" for the Junipers, but currently unattached to any interface. Again so we can reuse the subnets and avoid patching the dns repo in the short term.

All good for them to be removed from racks and cables removed. Please delete the cables/connections from Netbox to match what is done on site. Thanks.

Please delete the cables/connections from Netbox to match what is done on site.

For the record I deleted the cables going from Dells to the Spines in Netbox, and added new ones from the Junipers to the same Spine ports. In reality all we need to do is move the cable from the Dells to the Junipers to match.

VRiley-WMF updated the task description. (Show Details)

Change #1091711 abandoned by Ayounsi:

[operations/dns@master] Remove v6 include for e8/f8 uplinks

Reason:

reused for https://phabricator.wikimedia.org/T382017

https://gerrit.wikimedia.org/r/1091711

@VRiley-WMF These are still listed as in rack in netbox and decom status please update and fix

Yeah I'll tidy that up shortly. We re-used the original IPs for the connections to the replacement Juniper switches that went in.

But that was something of a mistake. As the Junipers are part of the EVPN cluster they should have only had IPv4 addresses on the links (everything else is tunneled in VXLAN) plus those link IPs should really come from a dedicated block. So while things are working now they have IPv6 addresses that aren't needed, and the v4 addresses are for the incorrect block.

Everything is working fine, but for the sake of avoiding confusion I'll renumber things this week so it's clear, and re-submit the above patch.

VRiley-WMF claimed this task.

Spoke to @cmooney and these switches have been set to offline. Closing this ticket