Page MenuHomePhabricator

Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of cr[12]-drmrs

cr1 : https://netbox.wikimedia.org/dcim/devices/3424/
cr2 : https://netbox.wikimedia.org/dcim/devices/3424/

Hostname / Racking / Installation Details

Racking info already in Netbox, as each router has been added to netbox in advance of arrival.

In T277586#7632268, @ayounsi wrote:

ETAs for the routers are tomorrow and Friday, yay!

Please have remote hands rack them and connect power/console/mgmt as usual.

This will be enough to start configuring them.

For the production ports, the following diagram is the desired state after phase 1:

2nd EU-Page-2.drawio(1).png (506×651 px, 63 KB)

  • Move the transit/peering/transport links (except Telia) from the asw to the same racks cr.
  • Disconnect cable D0065, one of the two between asw1-b12<->asw1-b13, to free up ports, fiber and 40G optics.
  • Add new links between routers and switches.

Phase 2 will consists of disconnecting the last asw1-b12<->asw1-b13 link (D0066), and moving the Telia transit link to cr1.

Please note the above diagram has a mistake, showing both routers connecting to PP:15/16 when cr1:xe-0/1/1 actually connects to Tata's port 11/12. The cable ID on the diagram is correct, so it is a single typo that I had to use the other info to figure out. Just calling it out for netops to update their diagram and ensure I assumed the correct things.

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cr1-drmrs:
  • - receive in system on procurement task T277586 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - phase 1 connect serial, mgmt, and fibers as noted in the above diagram.
  • - confirm SCS connection and hand off to netops for setup
  • - phase 2 connection
cr2-drmrs:
  • - receive in system on procurement task T277586 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - phase 1 connect serial, mgmt, and fibers as noted in the above diagram.
  • - confirm SCS connection and hand off to netops for setup
  • - phase 2 connection

Remote Hands Directions

Rob will file a ticket with Interxion remote hands. A draft of this can be seen here: https://docs.google.com/document/d/1OjfRgix1fpiko2Rz1uqAPUe4l9eKiGkK9wGdsNft8Yo/edit?usp=sharing

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH

Event Timeline

RobH triaged this task as High priority.Jan 27 2022, 5:39 PM
RobH created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Jan 27 2022, 5:39 PM

I've drafted the directions for remote hands, translating the above diagram to a step by step direction for them to rack our routers and bring them online. Rather than attempt to parse, correct, edit, etc on phab, the google doc of the draft is here: https://docs.google.com/document/d/1OjfRgix1fpiko2Rz1uqAPUe4l9eKiGkK9wGdsNft8Yo/edit?usp=sharing

I'll want to get netops to proof this to ensure I've gotten everything!

RobH renamed this task from Q3:(Need By: TBD) rack/setup/install cr[12]-drmrs to Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs.Jan 27 2022, 6:51 PM
RobH reassigned this task from RobH to ayounsi.
RobH added a subscriber: cmooney.

Please note the above diagram has a mistake, showing both routers connecting to PP:15/16 when cr1:xe-0/1/1 actually connects to Tata's port 11/12.

If I read T298208#7613083 correctly, Tata is moving to port 15/16. However according to https://netbox.wikimedia.org/circuits/circuits/111/ GTT is on port 15/16.

I left comments on the doc.

Please note the above diagram has a mistake, showing both routers connecting to PP:15/16 when cr1:xe-0/1/1 actually connects to Tata's port 11/12.

If I read T298208#7613083 correctly, Tata is moving to port 15/16. However according to https://netbox.wikimedia.org/circuits/circuits/111/ GTT is on port 15/16.

I left comments on the doc.

In T298208#7613083, @RobH wrote:

Chatted with Arzhel about this in IRC:

  • we'll put in the remote hands to move our existing Tata connection from MRS2:2R106:R54:B12:U47:X11/12 to MRS2:2R106:R54:B12:U47:X15/16.
    • we'll lose connectivity over the circuit immediately upon the move and it won't return until Tata completes the migration on January 22nd.
  • tata has the new loa via the email thread, so they should have started the xconnect ordering. That LoA details using our port MRS2:2R106:R54:B12:U47:X15/16.

15/16 in use and on LoA i put 17/18 so this comment was a mistake. Tata is moving to 17/18, but good catch!

Reviewing the rest of the on docs comment now, thanks!

Submitted the revised document (using numbered steps) to Interxion via ticket CS0433959. I listed @wiki_willy, @ayounsi, & @cmooney as additional notifications for this ticket.

They closed the old ticket due to delays in the router's shipment arrival, so opened a new one today as the routers are now at drmrs shipping.

CS0447193

Arzhel reminded me in our sync up meeting about the too short cable for mr1, so we've appended this to the ticket:

Support,
I neglected to add an additional request:

Please replace the too short patch from our patch panel port MRS2:2R106:R54:B12:U47:X13/14 to the SRX300 device labeled mr1-drmrs, port ge-0/0/7.

This patch is too short, and has zero strain relief. Please replace with a longer single mode SC to LC fiber patch, route it cleanly from our SRX to the patch panel using our fiber ducts, and label it D0107.

Please then leave the SC-LC 1M single mode fiber on top of the servers in our rack, along with our other spare cables.

Change 763551 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Add drmrs routers

https://gerrit.wikimedia.org/r/763551

Change 763551 merged by jenkins-bot:

[operations/homer/public@master] Add drmrs routers

https://gerrit.wikimedia.org/r/763551

Current status:

  • Physical work left (I'll give the details tomorrow):
    • Planned: move Telia's link to the routers now that we have transport and the routers configured, remove the asw<->asw link
    • Unplanned: the cross racks fibers are not run correctly, asw1-b12 only connects to cr1 (via 2 links) and asw1-b13 only to cr2 (via 2 links)
  • Software work left:
    • Cleanup the config and iron out the remaining details
    • Add to monitoring
    • Do failover testing
    • Optionally upgrade the routers (one minor version behind)
    • Fix GTT issue (email sent to GTT)
    • Optionally re-add GRE tunnel to esams (depending on GTT status
    • Implement anycast tuning
    • Mass send peering requests

I can put in a followup ticket for them to correct the 'unplanned' items but I'll wait until you finish your setup or give the go ahead, as you may discover other things. Just comment and kick this over to me for that when ready.

No packing slips in box according to the Interxion engineer who did our remote hands work, so I've requested a copy from Myriad so I can receive these two routers in Coupa.

What's left to do on the network side:

1/

cr1-drmrs:et-0/0/2 currently connected to:
asw1-b12-drmrs:et-0/0/50 with cable {#D0101}
Needs to move to:
asw1-b13-drmrs:et-0/0/50

2/

cr2-drmrs:et-0/0/2 currently connected to:
asw1-b13-drmrs:et-0/0/50 with cable {#D0103}
Needs to move to:
asw1-b12-drmrs:et-0/0/50


3/

Disconnect the cable D0066 between:
asw1-b13-drmrs:et-0/0/49 and asw1-b13-drmrs:et-0/0/49
Store the cable and optics

4/

Move patch cable D0068 (MRS2:2R106:R54:B12:U47:X07/08)
Currently connected to:
asw1-b12-drmrs:xe-0/0/45
Needs to be connected to:
cr1-drmrs:xe-0/1/3

Note that in one of their emails they mentioned that they didn't have any long enough power cord for cr2 (currently running on 1 PSU).

CS0451899

Support,

We recently filed CS0334187 for the installation of two of our routers. During that installation, it seems a few steps need adjustment, and require the following steps for correction:

1/

cr1-drmrs:et-0/0/2(B12:U45) currently connected to: asw1-b12-drmrs:et-0/0/50 with cable {#D0101}
Needs to move to: asw1-b13-drmrs:et-0/0/50

2/

cr2-drmrs:et-0/0/2(B13:U45) currently connected to: asw1-b13-drmrs:et-0/0/50 with cable {#D0103}
Needs to move to: asw1-b12-drmrs:et-0/0/50

3/

Disconnect the cable D0066 between:
asw1-b13-drmrs:et-0/0/49 and asw1-b13-drmrs:et-0/0/49
Store the cable and optics

4/

Move patch cable D0068 (MRS2:2R106:R54:B12:U47:X07/08)
Currently connected to: asw1-b12-drmrs:xe-0/0/45
Needs to be connected to: cr1-drmrs:xe-0/1/3

Hi @ayounsi - I'm not sure if you're copied on the Interxion ticket, so just forwarding the info along that they completed the patching request on Saturday. Thanks, Willy

I can confirm that (1), (2) and (4) are done.

However cr2-drmrs is currently fully down (console is dead as well). My guess is that they inadvertently bumped into the only power cable for cr2 (the redundant one was missing).
And I need it to confirm that (3) is done.

I unfortunately don't have access to comment on the Interxion's ticket.

I gave a call to Tarek: the power cord on cr2 was faulty, but he was able to find 2 spare ones which he will bill on the ticket.
I asked him to dispose of the faulty cable as well.
He connected the new cable to ps1 port 9

I also confirmed that (3) was done properly.

@RobH could you please close the Interxion ticket?

I closed out the ticket and this is now resolved.