Page MenuHomePhabricator

mgmt host interfaces down for rack D7 in codfw due to ge-0/0/30 on msw1-codfw down
Closed, ResolvedPublic

Description

Icinga complained today about not be able to reach all the mgmt interfaces of rack D7 in codfw. The related link on msw1-codfw seems down:

Jan 15 06:30:44  msw1-codfw chassism[1399]: ifd_process_flaps IFD: ge-0/0/30, sent flap msg to RE, Downstate
Jan 15 06:30:44  msw1-codfw chassism[1399]: 	 Link status change event: ifd ge-0/0/30 MAC ctrl reg0 :: 0x8BE5, MAC port status reg0 :: 0x6802, MAC auto-neg reg :: 0xB1F4
Jan 15 06:30:44  msw1-codfw chassism[1399]: 	Link status change event: ifd ge-0/0/30 PHY Link Status: DOWN,LP-AN capable: NO
Jan 15 06:30:44  msw1-codfw chassism[1399]: 	Link status change event: ifd ge-0/0/30 AN Status: Pending, Speed: 1000 Mbps, Duplex: HALF DUPLEX,Remote Link Fault: NO
Jan 15 06:30:44  msw1-codfw rpd[1422]: EVENT <UpDown> ge-0/0/30.0 index 2147404488 <Broadcast Multicast> address #0 b0.c6.9a.db.5.a1
Jan 15 06:30:44  msw1-codfw rpd[1422]: EVENT <UpDown> ge-0/0/30 index 159 <Broadcast Multicast> address #0 b0.c6.9a.db.5.a1
Jan 15 06:30:44  msw1-codfw mib2d[1421]: SNMP_TRAP_LINK_DOWN: ifIndex 540, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/30

Timing matches with the alarms :)

elukey@msw1-codfw> show interfaces ge-0/0/30
Physical interface: ge-0/0/30, Enabled, Physical link is Down
  Interface index: 159, SNMP ifIndex: 540
  Description: msw-d7-codfw {#10548} [1Gbps Cu]
  Link-level type: Ethernet, MTU: 1514, Speed: Auto, Duplex: Auto, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled,
  Auto-negotiation: Enabled, Remote fault: Online, Media type: Copper
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Current address: b0:c6:9a:db:05:a1, Hardware address: b0:c6:9a:db:05:a1
  Last flapped   : 2019-01-15 06:30:44 UTC (00:35:31 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled

  Logical interface ge-0/0/30.0 (Index 97) (SNMP ifIndex 541)
    Flags: Device-Down SNMP-Traps 0x0 Encapsulation: ENET2
    Input packets : 7059995
    Output packets: 3873668
    Protocol eth-switch
      Flags: None

{master:0}
elukey@msw1-codfw> show interfaces descriptions ge-0/0/30
Interface       Admin Link Description
ge-0/0/30       up    down msw-d7-codfw {#10548} [1Gbps Cu]

List of hosts affected (adding their service owners to the subscribers list as FYI):

https://netbox.wikimedia.org/dcim/racks/73/

elastic2054
elastic2053
ms-be2050
ms-be2039
ms-be2027
ms-be2026
ms-be2025
cp2026
cp2025
cp2024
cp2023

Related Objects

Event Timeline

elukey triaged this task as High priority.Jan 15 2019, 7:13 AM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 15 2019, 7:13 AM
Peachey88 updated the task description. (Show Details)Jan 15 2019, 7:45 AM
elukey updated the task description. (Show Details)Jan 15 2019, 8:51 AM
elukey added subscribers: Gehel, Mathew.onipe, fgiunchedi, BBlack.
ayounsi assigned this task to Papaul.Jan 15 2019, 4:03 PM

Papaul, can you please verify the status of msw-d7-codfw and its link to msw1-codfw, and replace any faulty part if necessary.

Thank you.

Papaul closed this task as Resolved.Jan 15 2019, 5:06 PM

looks like the mgmt switch froze have to unplug and plug the power back. Switch is back up