The VC link between asw2-A6:1/0 and asw2-A7:0/52 failed yesterday in a way where it was silently dropping packets.
We can see the errors with:
asw2-a-eqiad> show virtual-chassis vc-port statistics member 6 vcp-255/1/0 extensive RX CRC alignment errors: 43488980
Disabling the asw2-a6 side of that link fixed the issue.
On the physical side it's a regular fiber with QSFP optics on each end. So 3 parts are most likely at fault, the optics and the fiber as well as 2 possible cause, a physical move "bump" in the optic or fiber damaging them, or the power fluctuation damaging an optic.
Because it's a VC link there is no way of getting the optics levels (afaik).
Next steps:
- Check how many spares QSFP we have (not needed, DAC)
- Investigate the fiber for any obvious damage
- Even if no obvious damage, run a new fiber
- Keep a mtr/ping running between mw1312<->puppetmaster1001
- Warn people on -sre that we're enabling the previously faulty interface
- Enable interface request virtual-chassis vc-port set interface member 6 vcp-255/1/0
- Monitor packet loss
- Check VC path goes through directly from fpc6-fpc7 with for example show virtual-chassis vc-path source-interface ge-6/0/7 destination-interface ge-7/0/16
- If issues, disable interface, replace fpc7:0/52 QSFP, try the above again
- If issues do the same with fpc6:1/0 QSFP