Page MenuHomePhabricator

Agree how to document intra-DC patch panels in Netbox
Closed, ResolvedPublic

Description

Requirement

With the current expansion in eqiad to a second cage, and the facility-provided fiber runs between cages, there is an increased requirement for us to document the patch panel usage and connections in Netbox. The new scenario means having devices directly connected to patch panels either side, which is unlike previous requirements where all panels terminated outside circuits.

Netbox Model

Netbox has functionality to model patch panels, basically passive devices in a fiber path, which have "front" and "rear" ports which can be used to document the connections to equipment one side, and equivalent patch-panel connection on the other.

For instance I have created two example patch panels in netbox-next:

https://netbox-next.wikimedia.org/dcim/devices/?role=patch-panel

I've connected the 12 'rear' ports on these back-to-back (these represent the DC-provided runs between the panels in either cage).

We can then connect devices to the 'front' ports of each, to document the device->device connection, but also record where the fibers go and usage of the panels. For instance I've created a new dummy rack (E1) and device (SPINE E1 Dummy), and connected a port on it to cr2-eqiad via the panels:

https://netbox-next.wikimedia.org/dcim/devices/3431/interfaces/

You can then trace the cable and see the runs either side from device to panel:

https://netbox-next.wikimedia.org/dcim/interfaces/20703/trace/

Creating this task to facilitate discussion and gather thoughts on this approach. Seems fine from my point of view, was a major improvement in my last place when Netbox introduced this functionality. But I'd especially like to see what DC-Ops make of the proposal as it probably relates most to them.

Event Timeline

I agree that's the textbook way of doing it and where we need to be in the longer term.

We currently document patch panels information in both the "termination" box of Netbox circuits (best effort), and a X-connect spreadsheet.
They're prone to errors as they're free form and no reports/checks to enforce correctness.

However having the patch panels in Netbox introduces new complexity (all hypothetical bellow):

  • Requires serial/asset tags
  • Automation update
  • Additional steps when patching circuits
  • Reports for consistency/correctness

And would require back-porting all the existing patch panels to not have different ways of doing things.

I suggest we first check if we could replace the X-connect spreadsheet by introducing that new way of doing things, and possibly a report to expose directly useful information. That way this new complexity will be balanced with eliminating a pain point.

In the meantime I lean toward maintaining the status-quo for the eqiad expansion use-case by modeling those links using the well understood circuits feature.
That's me trying to keep Netbox lean, but as I'll not be the main user of the feature DCops opinion is important here as well.

I'd like to remove using the google sheet and support any alternative that moves it wholly to netbox. I think with the circuits patch panel field entry combined with the comments field on the circuits in netbox could be enough to eliminate the gsheet. Currently I'm the only one maintaining it, and it doesn't really scale well to multiple maintainers since it has no real error checking. Having this info in netbox would get more eyes on it, as netbox is interacted with quite often in comparison (to the gsheet of xconnects). We wouldn't have to delete the old google sheet, just depreciate and no longer update it.

Then we could move to fully utilizing the netbox patch panel model from there. On the point of asset tags, I am not exactly certain who owns the patch panels in each of our deployments. We can determine this pretty easily though by emailing our account reps with the question. If we own them, there shouldn't be a problem slapping an asset tag on each of them. If we don't own them, we'll want to perhaps put an asset tag exception in for patch panels and asset tags.

@ayounsi While I agree with your analysis of the difficulties, I wonder if a hybrid approach might be possible.

What we need to model for the eqiad expansion is something we have nowhere else in our infra, i.e. connections between our own equipment which are routed via patch panels. I think it's entirely reasonable to say for this new use-case we will use Netbox, and continue to use existing methods for cross connects going to MMR / WAN circuits etc.

Adding fake "circuits", with us as a provider, and no structured detail on how it corresponds to the actual patch panel, just seems like we are piling on more technical debt.

That said if doing it the "proper" Netbox way messes up automation or reports then maybe that is a reason to hold off. I'm not really aware what problems we might have there. We will have lots of scripting and automation changes to support the new racks either way, so it might not be a whole lot of extra work. But that may be a valid reason to hold off for sure.

I'm easy enough either way. I agree it's probably more something for DC-Ops to decide on, my main reason for creating this task was to demo how it can be done. But long as we've an agreed way to document device->device connections in the coming weeks I'm happy.

That said if doing it the "proper" Netbox way messes up automation or reports then maybe that is a reason to hold off. I'm not really aware what problems we might have there. We will have lots of scripting and automation changes to support the new racks either way, so it might not be a whole lot of extra work. But that may be a valid reason to hold off for sure.

The asset tag checks in the report are straightforward I don't see any problem to adjust them to whatever you decide here, should take few minutes. I'd say don't worry too much about that and pick the solution that best work in the longer term for you all.

My main worry is that we will end up with two ways of documenting and managing X-connects (and overall point to point links). And to me this seems more of an issue than technical debt as we already have all the tooling around the circuit feature.

That's why I'd prefer going fully one way (stick with circuits) or the other (add all patch panels).

Here is an example of having those cross connects as circuits: https://netbox-next.wikimedia.org/circuits/circuits/107/
It's set at type: cross-connect, but could be transport or ICCC as well.
Small side benefit is that It will also allow us to track unused links see https://netbox-next.wikimedia.org/circuits/circuits/?q=&type=cross-connect&provider=equinix&commit_rate=&cf_metric=&cf_state=

Hi @Papaul, @Cmjohnson, and @Jclark-ctr - let me know if you guys have any specific preferences or general feedback around this, before we proceed forward on standardizing how we document patch-panels. Thanks, Willy

I've attempted this once more (in live Netbox this time), based on our discussions on the weekly calls and info provided by John:

https://netbox.wikimedia.org/dcim/interfaces/24122/trace/

Let me know how this looks. I attempted to model the Equinix part of the run, corresponding to their labels on ports, as "circuits", but if you do that Netbox doesn't properly register the end-to-end device path, which messes up homer. So looks like we need to do all of the hops as "port to port" rather than having any circuits in there. To record the Equinix label/id for the patch I have used that as the cable label on their run, prefixed with "EQX".

Feel free to reply here or we can discuss on Monday's call.

Thanks for putting in and working on this @cmooney, this looks really great! Let's designate some time during Monday's meeting to gather any additional feedback from folks, and then proceed from there. Thanks, Willy

Thanks for the presentation on Monday! With the rise of breakout cables, and intra-rack patch panels it makes more and more sens to document them in Netbox.

Some thoughts, some already shared in the meeting:

  • The burden of defining them and keeping them up to date will be on DCops, so they definitely have the last word here :)
  • The data will need safeguards against typoes, entry-mistakes (eg. connecting something on the wrong port), and natural drift (forgetting to do a change in Netbox after doing it physically), otherwise it's going to become an untrustable mess with time :( All of the bellow would need DCops help to define them properly
    • As you mentioned, it's all passive equipment, so we can't have automatic checks, but some ideas:
    • Exporting/displaying the data in a way that facilitates visual inspection (suggested by Rob)
    • Netbox reports enforcing in house policies/conventions (ports names, etc), Netbox already enforces some of it (connectors, cable types, etc). Possibly to be defined as we find mistakes
    • Wikitech doc on how to use that feature
    • Possibly a script to help create panels or end to end circuits would be helpful to abstract the more complex fields (and avoid having to click 1000 times on the UI)
  • We're now running an old version of Netbox, new version might have at best improved the feature
  • On naming and conventions:
    • By convention we use small letters for all devices names
    • Cable labels should match exactly what they're physically labelled
    • We only document in Netbox devices that we own, which raises the question of what to do with the Equinix patch panels and racks (or change the convention, what consequences does it have?)
  • On X-connects (with the above point in mind):
    • I'd say it make sens to define all the cable we don't own (X-cages as well) as Netbox circuits,
    • And thus have the Equinix path panel documented in the circuit-termination, which is for now limited, but could be improved by custom fields once we upgrade Netbox
    • The X-connect spreadsheet could be fully migrated to Netbox by adding custom fields to the circuit termination (once we upgrade, but I don't think it would benefit much from the in-house patch cables

Exporting/displaying the data in a way that facilitates visual inspection (suggested by Rob)

I find the 'trace' in Netbox excellent, but happy to discuss if we need more than that.

By convention we use small letters for all devices names

Yep fair enough, I think the existing names probably need agreement on how to name anyway, as Rob observed I didn't quite catch the Equinix "rack" naming (the '0000'). I also do think we should model those racks rather than having this kit as separate / floating.

I'd say it make sense to define all the cable we don't own (X-cages as well) as Netbox circuits,

As I mentioned I tried this and it didn't work properly when I did this between the cages. The device -> device connection was not presented by netbox, which would break our automation I think. And even if we worked around it, not having, for instance, lsw1-e1 showing it's connected to cr1-eqiad when you look at it would not be ideal.

I wasn't of the opinion we needed to do this, until I realized Equinix give an ID to each cable. But given they do it would seem to be the better way to model.

If we can't model as circuits then they will have to remain as they are now (as cables). In which case we need agreement on how to name them, I think the 'EQX' naming I used was called out as not workable on the call.

For external circuits we can and obviously need to model them as circuits and we won't hit this issue so agree on all points there.

We only document in Netbox devices that we own, which raises the question of what to do with the Equinix patch panels and racks (or change the convention, what consequences does it have?)

We simply have to change the convention I feel. Or decide to model this completely outside Netbox.

To that end the current devices are throwing errors in a report though, based on provisioning tickets not matching, no asset IDs etc:

https://netbox.wikimedia.org/extras/reports/results/2590752/

So we need to sort that out, whether by adding those details or modifying the report to allow exceptions.

Thanks @ayounsi and @cmooney for all your feedback and forward thinking suggestions around this. What I'm leaning towards on this is to only document the info for the patch panels we own in Netbox. I think @Jclark-ctr was playing around with that last week with the expansion related stuff and seems to have that part down, so I think we should be good here. When it comes to patch panels that are on the vendor side, I'm thinking we should just continue adding the information under the patch panel section for circuits (example; https://netbox.wikimedia.org/circuits/circuits/6/) I'd rather not make too many changes in documenting things on the vendor side, until we have a more recent version of Netbox in place. Once a new version of Netbox is in place, maybe we can revisit again and see how things look under circuits at that point? Does that work for everyone?

Also, thanks @ayounsi for sending over the demo link to the latest Netbox version - https://demo.netbox.dev/ - much appreciated!

Thanks,
Willy

@wiki_willy no problem with that.

The only requirement our side it might impact is the ability to see the usage of a particular panel when planning new links/designs.

i.e. with the Equinix panel in netbox I can click in and see we've 2 free ports: https://netbox.wikimedia.org/dcim/devices/4097/front-ports/

Documenting the ports in the free-text field of circuits doesn't give a single view on this, and would take some time to compile them all each time we wanted to check the usage on the panel. It wouldn't be a disaster, but if we used something, a spreadsheet even, to document this in one place, it'd be a help.

Maybe a niptic, and let me know if I'm mistaken, but what we need to document are the circuits/links/x-connects usage between the two cages (as we have a limited amount of those).
Not the actual Equinix path panel ports (which where Equinix will add path panels when needed when they fill up.
If so we should have all those links in the "circuits" feature of Netbox, and with status "offline" for the unused ones. That way we can easily see where we're at with them.

ayounsi claimed this task.

I went ahead and created circuits instead of the existing cables, moved the patch cables over (and fixed some miss-cabling) as well as deleted the Equinix patch panels.

You can see here the list of Equinix intra-customer links: https://netbox.wikimedia.org/circuits/circuits/?q=&provider_id=4&type_id=4

This cleared the Coherence report errors.

Previous trace:

Screenshot from 2022-06-16 16-07-42.png (1×378 px, 59 KB)

New/current trace:

Screenshot from 2022-06-16 16-08-12.png (802×372 px, 48 KB)

One point we were not sure about if if we should call those links "transports". As their function is in between transports and core. We' decided to keep "transport" for now, and change it later on if we feel the need. Renaming will be quick in Netbox, but would required an easy patch in Homer at least.

I also looked at replacing the cross-connect google spreadsheet with Netbox

Unfortunately it won't be possible until https://github.com/netbox-community/netbox/issues/8511 is fixed, to add all the needed data (X-connect ticket, install date, Z-side, etc).

But it seems totally doable, see: https://netbox.wikimedia.org/circuits/circuits/?export=cross-connects
Or go to the circuit page, then export and "cross-connects". It can of course be a CSV instead.

Using https://netbox.wikimedia.org/extras/export-templates/2/