Page MenuHomePhabricator

Cookbook sre.network.configure-switch-interfaces failing on upgraded Juniper switch
Open, MediumPublic

Description

DC-Ops hit an issue when running sre.network.configure-switch-interfaces today for mc2055 connected to lsw1-a3-codfw.

The fault related to parsing the output from the show interfaces <interface> | display json for the switch interface (see P93718):

  File "/srv/deployment/spicerack/cookbooks/sre/network/__init__.py", line 216, in parse_results
    result = RemoteHosts.results_to_list(results_raw)[0][1]
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Problem

The issue seems to have occured after lsw1-a3-codfw was upgraded to JunOS 23.4 yesterday. Digging a little deeper what it seems to be doing is adding an interface if an SFP+ is connected, even without any config:

cmooney@lsw1-a3-codfw> show configuration interfaces | display set | match "0/0/41" 

{master:0}
cmooney@lsw1-a3-codfw> show interfaces terse | match 0/0/41        
xe-0/0/41               up    up
xe-0/0/41.16386         up    up

Whereas when I got dc-ops to add an SFP in port 41 of lsw1-a4-codfw, still on JunOS 22.2, we see this:

cmooney@lsw1-a4-codfw> show interfaces terse | match "0/0/41" 

{master:0}
cmooney@lsw1-a4-codfw>

The end result is the newer software returns JSON like this to the cookbook, whereas the older one returns something like this.

I'll take a closer look and see how we can adjust the cookbook to deal with this scenario.

Related Objects

Event Timeline

cmooney triaged this task as Medium priority.
cmooney updated the task description. (Show Details)

@ayounsi do you feel it's worth fixing this?

I got as far as the fact the JSON is sent to spicerack.remote.Remote.results_to_list(), and although it's a small function I can't quite wrap my head around how it's working with the 'show interface' output.

As far as I understand the cookbook does show configuration interfaces xe-0/0/41 | display json and not show interfaces xe-0/0/41 | display json so the change of behavior between the old and new switch *shouldn't* be relevant.

A first guess is that results_raw = remote_host.run_sync() ran the command on the switch, but as the switch didn't return any data RemoteHosts.results_to_list(results_raw) returned an empty list and thus accessing [0] fails.

What I find surprising is that testing the same behavior on lsw1-a4-codfw (still on old Junos) shows the issue:

spicerack-shell --live
>>> s1 = spicerack.remote().query("D{lsw1-a4-codfw.mgmt.codfw.wmnet}")
>>> s1_result_raw = s1.run_sync(f"show configuration interfaces xe-0/0/41 | display json")
>>> from spicerack.remote import Remote, RemoteHosts
>>> s1_result = RemoteHosts.results_to_list(s1_result_raw)
>>> s1_result
[]
>>> s1_result[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Without digging more I sent a patch to handle that usecase more gracefully.

As soon as the VCs are gone we can switch to using Homer all the time (short run time).

Change #1298100 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/cookbooks@master] configure_switch_interfaces: handle error case

https://gerrit.wikimedia.org/r/1298100

Ok thanks!

My bad on the command getting run. Let's see how we get on with the patch <3

Change #1298100 merged by jenkins-bot:

[operations/cookbooks@master] configure_switch_interfaces: handle error case

https://gerrit.wikimedia.org/r/1298100