While investigating a performance issue for backups between eqiad and codfw (T274234), it was discovered that there were output drops on ToR switches in eqiad, as can be seen here:
cmooney@asw2-c-eqiad> show interfaces xe-2/0/46 detail
Physical interface: xe-2/0/46, Enabled, Physical link is Up
Interface index: 918, SNMP ifIndex: 602, Generation: 609
Description: Core: cr2-eqiad:xe-3/0/2 {#3464}
Link-level type: Ethernet, MTU: 9192, MRU: 0, Speed: 10Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Media type: Fiber
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x4000
Link flags : None
CoS queues : 12 supported, 12 maximum usable queues
Hold-times : Up 0 ms, Down 0 ms
Current address: 4c:16:fc:fb:9d:72, Hardware address: 4c:16:fc:fb:9c:b1
Last flapped : 2020-09-02 09:59:19 UTC (39w6d 08:53 ago)
Statistics last cleared: 2021-06-08 17:54:19 UTC (00:58:35 ago)
Traffic statistics:
Input bytes : 352431957036 645678192 bps
Output bytes : 1455185482573 3198397656 bps
Input packets: 523378678 138276 pps
Output packets: 1682927702 465949 pps
IPv6 transit statistics:
Input bytes : 0
Output bytes : 0
Input packets: 0
Output packets: 0
Egress queues: 12 supported, 5 in use
Queue counters: Queued packets Transmitted packets Dropped packets
0 0 1683660591 335031
3 0 0 0
4 0 0 0
7 0 4870 0
8 0 20654 0
Queue number: Mapped forwarding classes
0 best-effort
3 fcoe
4 no-loss
7 network-control
8 mcastThese manifest in the SNMP metrics as Output Discards:
https://librenms.wikimedia.org/graphs/to=1623178200/id=15215/type=port_errors/from=1620499800/
We should probably alert if we see a lot of these, creating this task to track progress