Page MenuHomePhabricator

an-worker hosts: Netbox - PuppetDB interfaces discrepancies
Closed, ResolvedPublic

Description

As a result of an audit (see parent task) I've noticed some discrepancies between the data in Netbox and the one in PuppetDB for some an-worker hosts and their interfaces, like if they were reimaged but for some reason the PuppetDB import script was not run for them to update interface names that changed between the old names and the new ones.

If the changes are ok to be committed to Netbox we can just run the PuppetDB import script for all of those committing the changes.

This is the list of affected hosts: an-worker[1104-5,1108,1124,1128,1130]

Did anything particular happened to them compared to the rest of the cluster?

To see what changes would be applied you can run the https://netbox.wikimedia.org/extras/scripts/interface_automation.ImportPuppetDB/ script without checking the commit changes checkbox for them using as input:
an-worker1108 an-worker1104 an-worker1105 an-worker1124 an-worker1128 an-worker1130

Event Timeline

Volans triaged this task as Medium priority.

The only recent thing that I recall is T276239, but not for all workers mentioned. I checked quickly the dry-run for an-worker1104 and it looks consistent, in theory we could review case-by-case and commit changes to fix Netbox's state.

Looping in @BTullis that worked on the nodes for last :)

Yes I thought this was a bit odd. I saw there was a bit of re-imaging here: T231067#6891049 but that was before my time and nothing jumped out at me as unusual.
I'm happy to work through them and check for any inconsistencies.

an-worker1104

Current interfaces snapshot:

image.png (444×1 px, 68 KB)

Current interfaces:
  • eno1 - SFTP+ - connected - cable #1949 - asw2-c2-eqiad xe-2/0/34- ip 10.64.36.136/24 and 2620:0:861:106:10:64:36:136/64
  • eno2d1 - 1000BASE-T
  • eno3 - 1000BASE-T
  • eno4 - 1000BASE-T
Script output

image.png (512×967 px, 74 KB)

Output from ip a sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether bc:97:e1:50:e6:4c brd ff:ff:ff:ff:ff:ff
    inet 10.64.36.136/24 brd 10.64.36.255 scope global eno1np0
       valid_lft forever preferred_lft forever
    inet6 2620:0:861:106:10:64:36:136/64 scope global 
       valid_lft 2591993sec preferred_lft 604793sec
    inet6 fe80::be97:e1ff:fe50:e64c/64 scope link 
       valid_lft forever preferred_lft forever
3: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:97:e1:50:e6:4a brd ff:ff:ff:ff:ff:ff
4: eno2np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:97:e1:50:e6:4d brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:97:e1:50:e6:4b brd ff:ff:ff:ff:ff:ff
Output from puppetboard
interfaces :
lo :
ip : 127.0.0.1
bindings6 : [
address : ::1
netmask : ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
network : ::1
]
mt : 65536
bindings : [
address : 127.0.0.1
netmask : 255.0.0.0
network : 127.0.0.0
]
network6 : ::1
netmask6 : ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
ip6 : ::1
netmask : 255.0.0.0
network : 127.0.0.0

eno3 :
mac : bc:97:e1:50:e6:4a
mt : 1500

eno4 :
mac : bc:97:e1:50:e6:4b
mt : 1500

eno1np0 :
ip : 10.64.36.136
bindings6 : [
address : 2620:0:861:106:10:64:36:136
netmask : ffff:ffff:ffff:ffff::
network : 2620:0:861:106::


address : fe80::be97:e1ff:fe50:e64c
netmask : ffff:ffff:ffff:ffff::
network : fe80::
]
mt : 1500
bindings : [
address : 10.64.36.136
netmask : 255.255.255.0
network : 10.64.36.0
]
network6 : 2620:0:861:106::
netmask6 : ffff:ffff:ffff:ffff::
ip6 : 2620:0:861:106:10:64:36:136
netmask : 255.255.255.0
network : 10.64.36.0
mac : bc:97:e1:50:e6:4c

eno2np1 :
mac : bc:97:e1:50:e6:4d
mt : 1500


ip : 10.64.36.136
primary : eno1np0
mt : 1500
network6 : 2620:0:861:106::
hostname : an-worker1104
fqdn : an-worker1104.eqiad.wmnet
netmask6 : ffff:ffff:ffff:ffff::
ip6 : 2620:0:861:106:10:64:36:136
netmask : 255.255.255.0
network : 10.64.36.0
domain : eqiad.wmnet
mac : bc:97:e1:50:e6:4c

So at first glance, this looks like the Netbox script will do the right thing.
It will delete and recreate the the cable, but it links the correct interface to the existing port, instead of the non-existant interface.
It uses a default cable type of Passive DAC which matches the type of the current cable. The cable type makes sense too, but I haven't seen the cables themselves to check.
It deletes non-existent interfaces and creates new interfaces.

All of the servers have broadly the same format of changes. I'll makeanother pass tomorrow, but in general I'm almost happy to tick the commit box and re-run.

@BTullis fwiw +1 from my end, thanks for having a look.

If we look at another host that is not in the list, but was purchased and installed at the same time as an-worker110[45] (under ticket: T246784), namely an-worker1106, we can also see that the interface names in Netbox match what an-worker1104 will set.

image.png (460×519 px, 40 KB)

Therefore, I'm happy to proceed and commit these changes.

BTullis moved this task from Next Up to Done on the Data-Engineering-Kanban board.

Committed. The results are here: https://netbox.wikimedia.org/extras/scripts/results/1924060/
Results now look as expected.