While investigating a performance issue for backups between eqiad and codfw (T274234), it was discovered that there were output drops on ToR switches in eqiad, as can be seen here:
cmooney@asw2-c-eqiad> show interfaces xe-2/0/46 detail Physical interface: xe-2/0/46, Enabled, Physical link is Up Interface index: 918, SNMP ifIndex: 602, Generation: 609 Description: Core: cr2-eqiad:xe-3/0/2 {#3464} Link-level type: Ethernet, MTU: 9192, MRU: 0, Speed: 10Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Media type: Fiber Device flags : Present Running Interface flags: SNMP-Traps Internal: 0x4000 Link flags : None CoS queues : 12 supported, 12 maximum usable queues Hold-times : Up 0 ms, Down 0 ms Current address: 4c:16:fc:fb:9d:72, Hardware address: 4c:16:fc:fb:9c:b1 Last flapped : 2020-09-02 09:59:19 UTC (39w6d 08:53 ago) Statistics last cleared: 2021-06-08 17:54:19 UTC (00:58:35 ago) Traffic statistics: Input bytes : 352431957036 645678192 bps Output bytes : 1455185482573 3198397656 bps Input packets: 523378678 138276 pps Output packets: 1682927702 465949 pps IPv6 transit statistics: Input bytes : 0 Output bytes : 0 Input packets: 0 Output packets: 0 Egress queues: 12 supported, 5 in use Queue counters: Queued packets Transmitted packets Dropped packets 0 0 1683660591 335031 3 0 0 0 4 0 0 0 7 0 4870 0 8 0 20654 0 Queue number: Mapped forwarding classes 0 best-effort 3 fcoe 4 no-loss 7 network-control 8 mcast
These manifest in the SNMP metrics as Output Discards:
https://librenms.wikimedia.org/graphs/to=1623178200/id=15215/type=port_errors/from=1620499800/
We should probably alert if we see a lot of these, creating this task to track progress