Page MenuHomePhabricator

Import servers<->switches cables in eqiad & codfw
Closed, ResolvedPublic

Description

Prerequisite:

  • Update our tooling (cookbook, etc) to manage switch ports and cables in Netbox, see T265339 and T265341
  • Training of SRE on how to manually edit interfaces and vlans on Netbox then push changes via Homer

The different use cases I have in mind (to be completed) are:

  • Mass creation of new servers, for example by importing a CSV
  • Creation of a single new server (or low amount), for example through an interactive prompt
  • Decommissioning of a single server
  • Move of a single server?

Once this is ready, one time mass import the servers links (already done in the POPs) using either (or both to double check):

  • PuppetDB LLDP info (from the host point of view) - preferred
  • Switches LLDP info (directly or via LibreNMS) (from the switches point of view) - seems to be miss-reporting some devices see T250367
  • Parse the switches interface descriptions to fill the cable IDs (when present, core DCs servers don't have cable IDs)

Later on:

  • Update the PuppetDB or LibreNMS report to ensure consistency

Event Timeline

Change 634017 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/software/netbox-extras@master] Add Z side device/interface/vlan and cable to PuppetDB importer

https://gerrit.wikimedia.org/r/634017

Change 634048 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] PuppetDB microservice: allow LLDP fact

https://gerrit.wikimedia.org/r/634048

Change 634048 merged by Ayounsi:
[operations/puppet@production] PuppetDB microservice: allow LLDP fact

https://gerrit.wikimedia.org/r/634048

While working on this I noticed 1 consistency issue on the Netbox data:

Some devices (eg. https://netbox.wikimedia.org/dcim/devices/2272/ ) got added with a primary interface like enp175s0f0. Which got manually connected to the switches.

Not sure why it changed, maybe because of a Debian upgrade (or the name was wrong to start with), but the same interface is now ens3f0np0, but the old one didn't get renamed in Netbox, a new one got created.
We now have a situation where we have 1 incorrect interface with the correct cable, and 1 correct interface with no cable but with the IP config.

  • As they both have attributes (cables, or IPs) they can't be easily renamed
  • Cables can't have one of their endpoint moved, they need to be fully deleted and re-created
  • Some of the existing cables have attributes that can't be automatically fetched (color, type, length) by the above script

If it's too many instances to be dealt with manually, we could save the attribute of the cable, delete it, then re-create it using either the cable ID (not always present) or remote interface name as "primary key". But it doesn't look pretty.

The above got fixed manually by @Volans and I.

Change 634017 merged by Ayounsi:
[operations/software/netbox-extras@master] Add Z side device/interface/vlan and cable to PuppetDB importer

https://gerrit.wikimedia.org/r/634017

Change 636627 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/software/netbox-extras@master] ImportPuppetDB: add cable color/type

https://gerrit.wikimedia.org/r/636627

Change 636627 merged by Ayounsi:
[operations/software/netbox-extras@master] ImportPuppetDB: add cable color/type

https://gerrit.wikimedia.org/r/636627

Mentioned in SAL (#wikimedia-operations) [2020-11-09T16:20:08Z] <XioNoX> Netbox prod: mass import from PuppetDB (cables, etc) - T262899

ayounsi claimed this task.

All went as expected, thanks @Volans