User Details
- User Since
- May 10 2021, 3:25 PM (240 w, 1 d)
- Availability
- Available
- IRC Nick
- topranks
- LDAP User
- Cathal Mooney
- MediaWiki User
- CMooney (WMF) [ Global Accounts ]
Yesterday
I see these lines in /var/log/syslog in the busybox shell:
Dec 16 17:31:55 netcfg[1167]: INFO: Activating interface eno1np0 Dec 16 17:31:55 debconf: --> INPUT low netcfg/link_wait_timeout Dec 16 17:31:55 debconf: --> GET netcfg/link_wait_timeout Dec 16 17:31:55 netcfg[1167]: INFO: Waiting time set to 3 Dec 16 17:31:55 debconf: --> SUBST netcfg/link_detect_progress interface eno1np0 Dec 16 17:31:55 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress Dec 16 17:31:56 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down Dec 16 17:31:57 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down Dec 16 17:31:58 netcfg[1167]: INFO: ethtool-lite: eno1np0: carrier down Dec 16 17:31:58 netcfg[1167]: INFO: Reached timeout for link detection on eno1npp0 Dec 16 17:31:58 netcfg[1167]: INFO: found no link on interface eno1np0. Dec 16 17:31:58 netcfg[1167]: INFO: eno1np0 is not a wireless interface. Continuuing. Dec 16 17:31:58 netcfg[1167]: INFO: Taking down interface eno1np0 Dec 16 17:31:58 netcfg[1167]: INFO: Activating interface eno2np1 Dec 16 17:31:58 debconf: --> INPUT low netcfg/link_wait_timeout Dec 16 17:31:58 debconf: --> GET netcfg/link_wait_timeout Dec 16 17:31:58 netcfg[1167]: INFO: Waiting time set to 3 Dec 16 17:31:58 debconf: --> SUBST netcfg/link_detect_progress interface eno2np1 Dec 16 17:31:58 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress Dec 16 17:31:59 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down Dec 16 17:32:00 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down Dec 16 17:32:01 netcfg[1167]: INFO: ethtool-lite: eno2np1: carrier down Dec 16 17:32:01 netcfg[1167]: INFO: Reached timeout for link detection on eno2npp1 Dec 16 17:32:01 netcfg[1167]: INFO: found no link on interface eno2np1. Dec 16 17:32:01 netcfg[1167]: INFO: eno2np1 is not a wireless interface. Continuuing. Dec 16 17:32:01 netcfg[1167]: INFO: Taking down interface eno2np1 Dec 16 17:32:02 netcfg[1167]: INFO: Taking down interface eno2np1 Dec 16 17:32:02 netcfg[1167]: INFO: Activating interface eno3 Dec 16 17:32:02 debconf: --> INPUT low netcfg/link_wait_timeout Dec 16 17:32:02 debconf: --> GET netcfg/link_wait_timeout Dec 16 17:32:02 netcfg[1167]: INFO: Waiting time set to 3 Dec 16 17:32:02 debconf: --> SUBST netcfg/link_detect_progress interface eno3 Dec 16 17:32:02 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress Dec 16 17:32:02 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down Dec 16 17:32:03 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down Dec 16 17:32:04 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down Dec 16 17:32:05 netcfg[1167]: INFO: ethtool-lite: eno3: carrier down Dec 16 17:32:05 netcfg[1167]: INFO: Reached timeout for link detection on eno3 Dec 16 17:32:05 netcfg[1167]: INFO: found no link on interface eno3. Dec 16 17:32:05 netcfg[1167]: INFO: eno3 is not a wireless interface. Continuingg. Dec 16 17:32:05 netcfg[1167]: INFO: Taking down interface eno3 Dec 16 17:32:05 netcfg[1167]: INFO: Activating interface eno4 Dec 16 17:32:05 debconf: --> INPUT low netcfg/link_wait_timeout Dec 16 17:32:05 debconf: --> GET netcfg/link_wait_timeout Dec 16 17:32:05 netcfg[1167]: INFO: Waiting time set to 3 Dec 16 17:32:05 debconf: --> SUBST netcfg/link_detect_progress interface eno4 Dec 16 17:32:05 debconf: --> PROGRESS START 0 12 netcfg/link_detect_progress Dec 16 17:32:05 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down Dec 16 17:32:06 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down Dec 16 17:32:07 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down Dec 16 17:32:08 netcfg[1167]: INFO: ethtool-lite: eno4: carrier down Dec 16 17:32:08 netcfg[1167]: INFO: Reached timeout for link detection on eno4 Dec 16 17:32:08 netcfg[1167]: INFO: found no link on interface eno4. Dec 16 17:32:08 netcfg[1167]: INFO: eno4 is not a wireless interface. Continuingg. Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno4 Dec 16 17:32:08 debconf: --> GET netcfg/choose_interface Dec 16 17:32:08 netcfg[1167]: INFO: Could not find valid BOOTIF= entry in /proc//cmdline Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno1np0 Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description Dec 16 17:32:08 debconf: --> SET netcfg/choose_interface eno1np0: Broadcom Inc.and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno2np1 Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno3 Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description Dec 16 17:32:08 netcfg[1167]: INFO: Taking down interface eno4 Dec 16 17:32:08 debconf: --> METAGET netcfg/internal-wireless description Dec 16 17:32:08 debconf: --> SUBST netcfg/choose_interface ifchoices eno1np0: Brroadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller
It seems the interface can be set through the preseed file we pass to the debian installer. Our current setting is:
d-i netcfg/choose_interface select auto
Mon, Dec 15
This has come up again in terms of the pages we have been getting of late, and we may take some action to change our QoS profiling across the WAN as a result.
Sat, Dec 13
Fri, Dec 12
So it seems this is a known problem, we actually hit it before on another card. To avoid entirely we will need to upgrade JunOS on these routers
Hmm so this problem is worse than I thought at first. It is not just affecting the gnmic stats, but also the SNMP counters (LibreNMS shows the same problem) and even the packet counters shown on the CLI:
cmooney@re0.cr2-codfw> show interfaces xe-0/0/1:3 | match "Output rate" Dec 12 12:37:46 Output rate : 882741856 bps (82920 pps)
Thu, Dec 11
@JAllemandou thanks for the task. And apologies this has just hit you out of the blue - we should have reached out to warn you it was likely to increase when we fixed the ACLs to allow this traffic through to the pipeline earlier this week. We probably didn't anticipate it would be so much, but thinking it through that makes sense, it's basically most of our two core sites suddenly started sending data.
Wed, Dec 10
There will be more work to refine the configuration and add elements over time, but closing this for now as we have a working setup and live hosts connected.
This is done, or at least we have all the major coverage we need.
All the ports are now decom'ed on the switches / servers.
Folks I am going to close this one for now.
Tue, Dec 9
Mon, Dec 8
Thanks for the task @taavi. I think the idea makes sense, we probably need to take a close look at what they expose and how the networking would work if we move them to the cloud racks.
Nokia have come back to say they were able to reproduce the issue, and confirm the cause as well as the fact it is not a problem in the latest SR-Linux release:
Fri, Dec 5
Thanks for the work on this @MoritzMuehlenhoff!
Thu, Dec 4
Duplicate task made in error, will use T411781
Thanks @VRiley-WMF. I'm gonna re-open this as we still have to deal with cloudcephosd1052.
Wed, Dec 3
DC-Ops folks we can now remove these superflous cables from the racks, and once removed delete the cable in Netbox too.
Ok I've disabled all the unused ports on the cloud switches now. The one exception is for cloudcephosd1052, not sure what is up with this one but it seems that it has the vlan interface added, but still has the physical link configured and is using it? I didn't want to touch it:
cmooney@cloudcephosd1052:~$ ip -4 -br addr show | grep -v DOWN lo UNKNOWN 127.0.0.1/8 ens1f0np0 UP 10.64.148.31/24 ens1f1np1 UP 192.168.5.14/24 vlan1121@ens1f0np0 UP 192.168.5.14/24
cmooney@cloudcephosd1052:~$ ip route get fibmatch 192.168.5.1 192.168.5.0/24 dev ens1f1np1 proto kernel scope link src 192.168.5.14
Tue, Dec 2
Thanks to the awesome work of @jhathaway this is no longer a requirement. We can use --no82 with a host in BIOS boot mode, and that flag will be set automatically when reimaging on a Nokia switch where it's needed.
Thu, Nov 27
I'm going to close this task as the original ask is now complete. In terms of (what's really a wider question) the changelog being displayed or how to best communicate new options we can work on longer term.
Wed, Nov 26
@taavi broadly this looks good to me, nicely done.