Page MenuHomePhabricator

Comms to msw-d2-codfw down
Closed, ResolvedPublic

Description

Hey,

We've another one of these, I'm wondering if there may be some general issue? Seems odd we'd get so many in quick succession. This ones a re-purposed Juniper though, unlike the ones last week. It's from 2014 so maybe just getting old, do we have a refresh schedule for these?

Anyway all hosts in rack D2 are down on the management network as the link from msw-d2-codfw to msw1-codfw is down. Or to the point it's flapping up and down rapidly as seen on the msw1 side since approx 15:35 UTC yesterday (May 7th):

cmooney@msw1-codfw> show interfaces descriptions | match msw-d2    
ge-0/0/25       up    down Core: msw-d2-codfw:ge-0/0/47 {#10553}
May  8 08:15:01  msw1-codfw mib2d[1949]: SNMP_TRAP_LINK_DOWN: ifIndex 554, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/25
May  8 08:15:01  msw1-codfw fpc0 [EX-BCM PIC] ex_bcm_pic_ifd_config: ge-0/0/25, enable - 1
May  8 08:15:01  msw1-codfw fpc0 [EX-BCM PIC] ex_bcm_pic_ifd_config: ge-0/0/25 is already in state = 1
May  8 08:15:04  msw1-codfw rpd[1950]: RPD_IFL_NOTIFICATION: EVENT [UpDown] ge-0/0/25.0 index 578 [Up Broadcast Multicast] address #0 88.e6.4b.38.c3.3c
May  8 08:15:04  msw1-codfw rpd[1950]: RPD_IFD_NOTIFICATION: EVENT <UpDown> ge-0/0/25 index 671 <Up Broadcast Multicast> address #0 88.e6.4b.38.c3.3c
May  8 08:15:04  msw1-codfw fpc0 [EX-BCM PIC] ex_bcm_pic_ifd_config: ge-0/0/25, enable - 1
May  8 08:15:04  msw1-codfw fpc0 [EX-BCM PIC] ex_bcm_pic_ifd_config: ge-0/0/25 is already in state = 1
May  8 08:15:04  msw1-codfw mib2d[1949]: SNMP_TRAP_LINK_UP: ifIndex 554, ifAdminStatus up(1), ifOperStatus up(1), ifName ge-0/0/25
May  8 08:15:04  msw1-codfw mib2d[1949]: SNMP_TRAP_LINK_UP: ifIndex 579, ifAdminStatus up(1), ifOperStatus up(1), ifName ge-0/0/25.0
May  8 08:15:04  msw1-codfw rpd[1950]: RPD_IFL_NOTIFICATION: EVENT [UpDown] ge-0/0/25.0 index 578 [Broadcast Multicast] address #0 88.e6.4b.38.c3.3c
May  8 08:15:04  msw1-codfw rpd[1950]: RPD_IFD_NOTIFICATION: EVENT <UpDown> ge-0/0/25 index 671 <Broadcast Multicast> address #0 88.e6.4b.38.c3.3c
May  8 08:15:04  msw1-codfw fpc0 ifp ge-0/0/25 ifd_mdown: 38589115748 ms
May  8 08:15:04  msw1-codfw mib2d[1949]: SNMP_TRAP_LINK_DOWN: ifIndex 554, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/25

Devices connected to msw-d2-codfw show their ports as being "up", same as the ones last week, so it doesn't seem like a power failure on the msw in the rack.

DC-Ops can you have a look when on site? Thanks.

Event Timeline

cmooney triaged this task as High priority.Wed, May 8, 10:58 AM
cmooney created this task.

@cmooney I think this is just a human error issue. We were racking all the lsw1-d* yesterday and maybe we accidentally bumped into the cable. We will check once on site.

Thanks

@cmooney I think this is just a human error issue. We were racking all the lsw1-d* yesterday and maybe we accidentally bumped into the cable. We will check once on site.

Thanks

Ah cool no worries yeah bound to be a few of those.

Jhancock.wm claimed this task.
Jhancock.wm subscribed.

port 47 on the maw was going up and down on it's own. replaced the rj-45 terminator. remained steady.