Page MenuHomePhabricator

Inbound interface errors
Closed, ResolvedPublic

Description

Common information

  • description: Rule: Inbound interface errors Faults: #1: ae0 - cr2-eqiad:ae0 #2: xe-3/3/0 - Core: cr2-eqiad:xe-3/3/0 {#2651}

https://wikitech.wikimedia.org/wiki/Network_monitoring#LibreNMS_alerts

  • summary: Alert for device cr1-eqiad.wikimedia.org - Inbound interface errors
  • timestamp: 2022-07-20 12:16:51
  • alertname: Inbound interface errors
  • instance: cr1-eqiad.wikimedia.org
  • scope: global
  • severity: task
  • source: librenms
  • team: dcops

Firing alerts


  • description: Rule: Inbound interface errors Faults: #1: ae0 - cr2-eqiad:ae0 #2: xe-3/3/0 - Core: cr2-eqiad:xe-3/3/0 {#2651}

https://wikitech.wikimedia.org/wiki/Network_monitoring#LibreNMS_alerts

  • summary: Alert for device cr1-eqiad.wikimedia.org - Inbound interface errors
  • timestamp: 2022-07-20 12:16:51
  • alertname: Inbound interface errors
  • instance: cr1-eqiad.wikimedia.org
  • scope: global
  • severity: task
  • source: librenms
  • team: dcops
  • Source

Event Timeline

The optic for cr2 xe-3/0/3 has been swapped

Mentioned in SAL (#wikimedia-operations) [2022-07-20T13:33:53Z] <XioNoX> cr2-eqiad# deactivate interfaces xe-3/3/0 - T313337

Looks like two interfaces are/were showing errors:
cr2-eqiad:xe-3/0/3 - remote side seeing inbound errors: https://librenms.wikimedia.org/graphs/to=1658306400/id=12731/type=port_errors/from=1658133600/
I re-enabled the interface but it sill shows as down even though light levels seem fine.
@Cmjohnson could you have a look? Maybe clean the fiber, etc

And now
cr2-eqiad:xe-3/3/0 - remote side seeing inbound errors: https://librenms.wikimedia.org/graphs/to=1658323200/id=11622/type=port_errors/from=1658236800/
I disabled cr2-eqiad:xe-3/3/0 for optic replacement as well

Both are suspicious though.

ayounsi triaged this task as High priority.

I swapped the optics for both and cleaned fiber.

Both are back to normal. Thanks!