Page MenuHomePhabricator

Three ports on asw2-d-eqiad are not working as expected
Open, Stalled, LowPublic


During the debugging of T247561 the hosts kafka-jumbo1006 and stat1005 were moved to different switch ports that didn't really work.


  • stat1005 on ge-1/0/4 and kafka-jumbo1006 on ge-1/0/5 show up in icinga at the same time as DOWN
  • kafka-jumbo1006 is moved to ge-1/0/9 and regain connectivity
  • stat1005 is moved to ge-1/0/6 but still shows no connectivity
  • stat1005 is moved to ge-1/0/43 and regain connectivity

So at least 3 ports on asw2-d-eqiad are not working as expected: ge-1/0/4, ge-1/0/5 and ge-1/0/6

Arzhel suggested to test those ports with a laptop or similar to see if they are really not working at all or not.

Event Timeline

I just attempted us use ge-1/0/6 and it did not work

If they're dead:

  • Either we need them (eg. short on ports), and in that case we need to replace the switch. Which is a heavy operations.
  • Or we mark the ports as dead (with a mention of that task), disable them and call it a day.

If three ports are permanently failed, I'm not sure how we could ever trust that switch again. Perhaps it's better to do a painful but planned replacement rather than have it fail at some inconvenient time and having to rush a replacement then?

ayounsi changed the task status from Open to Stalled.May 19 2020, 7:29 AM
ayounsi removed Cmjohnson as the assignee of this task.
ayounsi triaged this task as Low priority.
ayounsi added a subscriber: Cmjohnson.

Sounds good! This will have to wait for a time we for example do T196487. Outside of COVID times as it's impactful and not urgent.

Change 623177 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] move dumps around on the snapshots in prep network upgrade work

Change 623177 merged by ArielGlenn:
[operations/puppet@production] move dumps around on the snapshots in prep for network upgrade work