Page MenuHomePhabricator

msw-c5-eqiad offline
Closed, ResolvedPublic

Description

At 13:05 UTC today msw-c5-eqiad went offline, port on msw1-eqiad went hard down:

Oct 20 13:05:41  msw1-eqiad mib2d[2003]: SNMP_TRAP_LINK_DOWN: ifIndex 551, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/22

This has broken management network access to the following devices:

DNS NameIP
an-conf1002.mgmt.eqiad.wmnet10.65.5.119
an-db1002.mgmt.eqiad.wmnet10.65.1.53
an-test-worker1002.mgmt.eqiad.wmnet10.65.0.69
cloudcontrol1005.mgmt.eqiad.wmnet10.65.4.188
cloudmetrics1001.mgmt.eqiad.wmnet10.65.2.112
db1120.mgmt.eqiad.wmnet10.65.1.5
db1145.mgmt.eqiad.wmnet10.65.1.139
db1146.mgmt.eqiad.wmnet10.65.1.140
db1168.mgmt.eqiad.wmnet10.65.0.181
db1169.mgmt.eqiad.wmnet10.65.0.187
db1181.mgmt.eqiad.wmnet10.65.0.218
db1189.mgmt.eqiad.wmnet10.65.3.2
dbproxy1018.mgmt.eqiad.wmnet10.65.2.173
dbproxy1019.mgmt.eqiad.wmnet10.65.2.174
dbproxy1020.mgmt.eqiad.wmnet10.65.2.175
dbproxy1021.mgmt.eqiad.wmnet10.65.2.176
es1022.mgmt.eqiad.wmnet10.65.4.146
ganeti1010.mgmt.eqiad.wmnet10.65.5.105
ganeti1024.mgmt.eqiad.wmnet10.65.1.208
gitlab-runner1003.mgmt.eqiad.wmnet10.65.2.91
kubernetes1012.mgmt.eqiad.wmnet10.65.4.194
mw1484.mgmt.eqiad.wmnet10.65.2.216
pc1013.mgmt.eqiad.wmnet10.65.1.189
ps1-c5-eqiad.mgmt.eqiad.wmnet10.65.0.52
wdqs1013.mgmt.eqiad.wmnet10.65.4.185

DC-Ops can we get someone to investigate the issue? Hopefully we can get it back up, not sure if we have a suitable replacement on site.

Event Timeline

cmooney triaged this task as High priority.Oct 20 2022, 1:31 PM
cmooney created this task.

msw-c5-eqiad unresponsive. utilized previous decom switch to bring management connection back online. netbox updated

cmooney closed this task as Resolved.EditedOct 20 2022, 2:09 PM
cmooney claimed this task.

Awesome @Jclark-ctr thanks for the speedy response!

I can confirm port is back up:

Oct 20 13:58:27  msw1-eqiad mib2d[2003]: SNMP_TRAP_LINK_UP: ifIndex 551, ifAdminStatus up(1), ifOperStatus up(1), ifName ge-0/0/22

MAC addresses are learnt:

cmooney@msw1-eqiad> show ethernet-switching table interface ge-0/0/22.0  

MAC database for interface ge-0/0/22.0

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 38 entries, 38 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR 
    name                address             flags              interface              Index     ID
    default             00:0a:9c:62:ec:e4   D             -   ge-0/0/22.0            0         0       
    default             14:58:d0:47:27:b0   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:3d:61:cf   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:ab:f3   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:e4:bc   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:f8:35   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:85:3a:69   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:87:f9:3d   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:8a:6b:6a   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:a7:0c:ed   D             -   ge-0/0/22.0            0         0       
    default             34:73:5a:fb:84:a6   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:66:1d:d3   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:9f:29   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:a5:98   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:a7:b3   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:aa:46   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:b0:7c   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:a6:1c:43   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:af:62:e6   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:af:78:e4   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:c4:a6:43   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:c9:f0:35   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:ca:aa:25   D             -   ge-0/0/22.0            0         0       
    default             58:8a:5a:e8:4a:06   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b0:9a:90   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b4:62:58   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b4:6c:d8   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:81:ee   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:82:36   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:83:b6   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:bc:ec:54   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:0d:2a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:17:2a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:4f:0a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5a:32   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5a:d2   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5b:0a   D             -   ge-0/0/22.0            0         0       
    default             d0:8e:79:f4:15:fa   D             -   ge-0/0/22.0            0         0

And affected devices are reachable again:

64 bytes from an-conf1002.mgmt.eqiad.wmnet (10.65.5.119): icmp_seq=1 ttl=62 time=24.2 ms
64 bytes from wmf5066.mgmt.eqiad.wmnet (10.65.1.53): icmp_seq=1 ttl=62 time=24.0 ms
64 bytes from wmf4834.mgmt.eqiad.wmnet (10.65.0.69): icmp_seq=1 ttl=62 time=0.868 ms
64 bytes from cloudcontrol1005.mgmt.eqiad.wmnet (10.65.4.188): icmp_seq=1 ttl=62 time=0.850 ms
64 bytes from wmf4659.mgmt.eqiad.wmnet (10.65.2.112): icmp_seq=1 ttl=253 time=0.456 ms
64 bytes from wmf7363.mgmt.eqiad.wmnet (10.65.1.5): icmp_seq=1 ttl=62 time=0.549 ms
64 bytes from db1145.mgmt.eqiad.wmnet (10.65.1.139): icmp_seq=1 ttl=62 time=0.786 ms
64 bytes from wmf5401.mgmt.eqiad.wmnet (10.65.1.140): icmp_seq=1 ttl=62 time=0.829 ms
64 bytes from db1168.mgmt.eqiad.wmnet (10.65.0.181): icmp_seq=1 ttl=62 time=15.2 ms
64 bytes from wmf5474.mgmt.eqiad.wmnet (10.65.0.187): icmp_seq=1 ttl=62 time=24.3 ms
64 bytes from wmf4970.mgmt.eqiad.wmnet (10.65.0.218): icmp_seq=1 ttl=62 time=23.8 ms
64 bytes from db1189.mgmt.eqiad.wmnet (10.65.3.2): icmp_seq=1 ttl=62 time=0.916 ms
64 bytes from wmf5179.mgmt.eqiad.wmnet (10.65.2.173): icmp_seq=1 ttl=62 time=0.778 ms
64 bytes from dbproxy1019.mgmt.eqiad.wmnet (10.65.2.174): icmp_seq=1 ttl=62 time=24.2 ms
64 bytes from wmf5181.mgmt.eqiad.wmnet (10.65.2.175): icmp_seq=1 ttl=62 time=24.8 ms
64 bytes from dbproxy1021.mgmt.eqiad.wmnet (10.65.2.176): icmp_seq=1 ttl=62 time=24.4 ms
64 bytes from es1022.mgmt.eqiad.wmnet (10.65.4.146): icmp_seq=1 ttl=62 time=0.843 ms
64 bytes from ganeti1010.mgmt.eqiad.wmnet (10.65.5.105): icmp_seq=1 ttl=62 time=23.3 ms
64 bytes from wmf4881.mgmt.eqiad.wmnet (10.65.1.208): icmp_seq=1 ttl=62 time=17.4 ms
64 bytes from wmf4935.mgmt.eqiad.wmnet (10.65.2.91): icmp_seq=1 ttl=62 time=0.902 ms
64 bytes from wmf5384.mgmt.eqiad.wmnet (10.65.4.194): icmp_seq=1 ttl=62 time=0.922 ms
64 bytes from mw1484.mgmt.eqiad.wmnet (10.65.2.216): icmp_seq=1 ttl=62 time=0.860 ms
64 bytes from pc1013.mgmt.eqiad.wmnet (10.65.1.189): icmp_seq=1 ttl=62 time=25.8 ms
64 bytes from ps1-c5-eqiad.mgmt.eqiad.wmnet (10.65.0.52): icmp_seq=1 ttl=253 time=1.04 ms
64 bytes from wmf5341.mgmt.eqiad.wmnet (10.65.4.185): icmp_seq=1 ttl=62 time=0.862 ms
ayounsi reassigned this task from cmooney to Jclark-ctr.
ayounsi subscribed.

Thanks for the quick turnaround!

There is an outstanding diff in Homer:

Changes for 1 devices: ['msw1-eqiad.mgmt.eqiad.wmnet']

[edit interfaces ge-0/0/22]
-   description "Core: msw-c5-eqiad:47 {#1544}";
+   description "Core: WMF4900:47 {#1544}";

This is due to the switch port pointing to the now decom device: https://netbox.wikimedia.org/dcim/interfaces/7795/trace/ and needs to be updated as well.