Page MenuHomePhabricator

Spicerack: add network devices support
Closed, ResolvedPublic


The only options we have to make network changes so far are:
1/ manual change
2/ automated but full config change via Homer

Unfortunately, this 2nd option is slow (gathering data from multiple locations, generating a whole config), not flexible, and error prone (eg. risk of pushing an unrelated change)

I suggest we introduce a 3rd way in the form of a Spicerack module that will abstract specific parts of the configuration.
This should work hand in hand with Netbox to be dynamic (for example: no dependency on Homer yaml files).

It will be more and more valuable as vendors are moving away from flat configurations, towards REST-like APIs.

Short term use cases:

  • server facing switch port config for host provisioning and decommissioning - DONE
  • Troubleshooting automation - DONE

Medium term is:

  • Circuits draining (see T260355)
  • BGP automation (server facing) - T306649

Longer term:

  • Maintenance automation (upgrade device, etc)
  • DDoS mitigation automation (T251767, blackholing, etc)

Focusing on the short term, as a proof of concept this module would be used by both cookbooks:

  • sre.hosts.provision
  • sre.hosts.decommission

The switch configuration would be abstracted with:

def configure_switch_interface(self, netbox_device)

Which would:
1/ get the switch/interface data from Netbox (requires Netbox to already be up to date, which can be done right before, in the same cookbook)
2/ get the current switch port(s) config
3/ warn the user (or refuse to do the change) if the interface is used for something else
4/ configure the switch port(s) - for now just replace the interface config, later on could be idempotent and offer conciliation options (eg. only change the MTU or vlans)

That's a high level view, feedback welcome :)

Event Timeline

ayounsi triaged this task as Medium priority.Apr 20 2022, 4:21 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for opening the task to discuss details. As the first feedback I've a primary question that is how you envision this new third way to configured the network devices to re-conciliate with the existing two?
Basically, if we do a change via this method, would then homer be out of sync? Or anything we plan to do with this method will be automatically included in homer runs and so would be a noop for homer based on the updated Netbox configuration and the new state of the network device?

As for the generic implementation how would you envision the transport to use? Cumin?
I'm asking because although python3-junos-eznc is technically available in Debian, there might be some resistance to get paramiko included.

Yeah, I'm expecting Netbox to always be the source of truth so a homer run after a spicerack run would be a NOOP.

junos-eznc is what I'm locally using because it abstracts the ugly part. We should discuss if it's possible to use cumin.

Step 1 is done, cookbook is ready for prime time.

Change 815295 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/cookbooks@master] fix typo

Change 815295 merged by jenkins-bot:

[operations/cookbooks@master] fix typo

cookbooks.sre.hosts.decommission executed by ayounsi@cumin1001 for hosts: sretest1001.eqiad.wmnet

  • sretest1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Icinga/Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
ayounsi updated the task description. (Show Details)

Closing this task as the short term goals are done, medium terms have their own task.