Page MenuHomePhabricator

document all scs connections
Closed, ResolvedPublic

Description

The recent failure of scs-c1-eqiad has demonstrated that we should have documented the serial connections in advance of the serial console's failure. Since we didn't document this, @Cmjohnson is going to have to trace out each serial connection, and that particular scs is completely full (so 48 ports of tracing.) Each cable (should) have a cable ID #, but it still means he has to check each device and write down the label number attached to it, and then find it at the scs end and plug it back into the new switch.

I suppose a wikitech page is the bare minimum, detailing each scs console, its ports, labels, and the cable id # that has been added to each of the serial connections.

Since we use opengear, we can also backup the configuration of each scs and store it for restoration on replacement opengear equipment.

This way when a scs console dies, we already know what each cable (with its #) is for replacements.

Event Timeline

1/ Longer term that data should be in Netbox or similar - T170144. Until then spreadsheet or Wikitech seems fine to me.

2/ Rancid seems to be able to pull and archive configuration from OpenGear, to be investigated: https://opengear.zendesk.com/hc/en-us/articles/216369543-RANCID-Support

Change 378708 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Add OpenGear support to Rancid

https://gerrit.wikimedia.org/r/378708

Change 378708 merged by Ayounsi:
[operations/puppet@production] Add OpenGear support to Rancid

https://gerrit.wikimedia.org/r/378708

Change 381094 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Rancid: Add scs-ulsfo

https://gerrit.wikimedia.org/r/381094

Change 381094 merged by Ayounsi:
[operations/puppet@production] Rancid: Add scs-ulsfo

https://gerrit.wikimedia.org/r/381094

Change 381125 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Rancid add 4 conservers as up; 1 as down until replaced

https://gerrit.wikimedia.org/r/381125

Change 381125 merged by Ayounsi:
[operations/puppet@production] Rancid add 4 conservers as up; 1 as down until replaced

https://gerrit.wikimedia.org/r/381125

I added all the conservers to rancid except scs-c1-eqiad.mgmt.eqiad.wmnet.
Ping me when it's back online and I can take care of it.

scs-c1-eqiad.mgmt.eqiad.wmnet is online

scs-c1-eqiad.mgmt.eqiad.wmnet added to rancid.

ayounsi removed ayounsi as the assignee of this task.

All scs connections are tracked in Netbox.
Last missing info is cable IDs

faidon added a project: ops-eqiad.
faidon subscribed.

Per @ayounsi above, "Last missing info is cable IDs". I don't see that as having taken place yet, right? The Cables report is even emitting soft-warnings about it (warnings that we should convert to errors once this work completes). Reopening the task, as it was probably resolved by mistake.

DC-Ops, please document the remaining console cable IDs (I believe all are in eqiad at the moment). Some should happen at the same time as the scs-a8-eqiad replacement (T228919) to avoid doing this twice, but there are others (incl. scs-c1-eqiad's) that can happen independently.

RobH edited subscribers, added: Jclark-ctr; removed: gerritbot, Stashbot, RobH.

I've moved this off blacklog and into hw repair column of the ops-eqiad workboard. I'm also unsubscribing, since this documentation has to occur via one of the on-sites (Chris or John) since it is basically a documentation of onsite cables task.

John has been working on 'projects' while his hand heals, so this might be ideal for that?

RobH removed RobH as the assignee of this task.Dec 4 2020, 11:00 PM
RobH subscribed.
RobH unsubscribed.
Cmjohnson claimed this task.

All the connections have been documented and labels updated

thanks @ayounsi there is another task for duplicate labels. That is all fixed.