Page MenuHomePhabricator

Rack/cable/configure ulsfo MX204
Closed, ResolvedPublic

Description

Aiming at starting the work on Wednesday June 27th, 11am local time (6pm UTC), 5h estimated maintenance.

EDIT: Blocked by T196030
EDIT2: new plan due to DC move

  • Rack/power the routers in temp location
  • Upgrade cr3/4
  • Configure cr3/4
  • Disable transit/peering links on cr3/4
deactivate protocols bgp group IX4
deactivate protocols bgp group IX6
deactivate protocols bgp group Transit4
deactivate protocols bgp group Transit6
  • Prepare DNS/Icinga/Smokeping/Rancid CRs

https://gerrit.wikimedia.org/r/461228
https://gerrit.wikimedia.org/r/461233

  • Move cr3/4 to new racks
  • Connect cr3<->cr4 link

During maintenance window:

  • Depool ulsfo from serving traffic via dns
  • Downtime all ulsfo hosts in Icinga/LibreNMS (and devices linked to ulsfo hosts)
  • Shutdown cr1/2
  • Connect cr3/4<->asw2 link
  • Reconfigure asw2 for new router
set interfaces et-1/0/24 mtu 9192
set interfaces et-1/0/24 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members private1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members public1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members customer-1montgomery
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members sandbox1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members XLink1

set interfaces et-2/0/24 mtu 9192
set interfaces et-2/0/24 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members private1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members public1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members customer-1montgomery
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members sandbox1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members XLink1

delete interfaces ae1
delete interfaces ae2
delete interfaces xe-2/0/18
delete interfaces xe-2/0/19
delete interfaces xe-1/0/14
delete interfaces xe-1/0/15
delete interfaces interface-range infrastructure member xe-2/0/18
delete interfaces interface-range infrastructure member xe-2/0/19
set interfaces interface-range infrastructure member et-1/0/24
set interfaces interface-range infrastructure member et-2/0/24
  • Connect cr3/4 transport links

cr3:xe-0/1/1 Telia transport
cr4:xe-0/1/2 Zayo transport

  • Verify all sessions are up, no alarms, prefixes exchanged, ulsfo devices reachable
  • Connect/enable cr3/4 peering/transit

cr3:xe-0/1/0 Zayo Transit
cr3:xe-0/1/2 Telia transit
cr4:xe-0/1/0 NTT transit
cr4:xe-0/1/1 Equinix

activate protocols bgp group IX4
activate protocols bgp group IX6
activate protocols bgp group Transit4
activate protocols bgp group Transit6
  • Verify all sessions are up, no alarms, prefixes exchanged, ulsfo devices reachable
  • Merge pending CRs (DNS then puppet)
  • LibreNMS, delete cr3/4, rename cr1/2 to cr3/4 (to keep history)
  • Verify monitoring if happy
  • Re-pool ulsfo

After maintenance:

  • Rename neighbor's interfaces descriptions
  • Update racktables/Netbox
  • Wipe/unrack cr1/cr2

Details

Related Gerrit Patches:
operations/puppet : productionPuppet, rename all instances of cr1/2-ulsfo to cr3/4
operations/dns : mastercr1/2-ulsfo -> cr3/4-ulsfo renaming
operations/dns : masterAdd loopback IPs for cr3/4-ulsfo
operations/dns : masterAdd mgmt for cr3/4-ulsfo

Event Timeline

ayounsi triaged this task as Medium priority.Mar 13 2018, 3:01 AM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptMar 13 2018, 3:01 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi changed the task status from Open to Stalled.Mar 13 2018, 11:13 PM
ayounsi updated the task description. (Show Details)
ema moved this task from Triage to Hardware on the Traffic board.Mar 19 2018, 9:36 AM
ayounsi updated the task description. (Show Details)Apr 25 2018, 8:09 PM

Note this will involve a planned ulsfo site outage, with its traffic falling back to codfw. If things go well the outage should be brief, the 5h estimate above is worst-case with complications. We should avoid taking other significant risks while this is ongoing (esp anything re: codfw-vs-eqiad redundancy, or risks to eqsin).

ayounsi updated the task description. (Show Details)May 2 2018, 4:16 PM
RobH added a comment.May 2 2018, 11:23 PM

one of the two routers is now temp racked (not enough rack studs to actually mount, its resting on top of the other servers) with temp power/mgmt leads run.

stole the mgmt connection for cr2-ulsfo, and plugged into the mx204

Change 430516 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] Add mgmt for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430516

Change 430516 merged by Ayounsi:
[operations/dns@master] Add mgmt for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430516

Change 430517 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] Add loopback IPs for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430517

Change 430517 merged by Ayounsi:
[operations/dns@master] Add loopback IPs for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430517

The ports mode combinations on the MX204 are so complex that Juniper wrote a webapp to validate plans: https://apps.juniper.net/home/port-checker/

ayounsi updated the task description. (Show Details)May 31 2018, 9:24 AM
ayounsi updated the task description. (Show Details)Jun 25 2018, 1:31 PM
ayounsi changed the status of subtask T196030: troubleshoot cr3/cr4 link from Open to Stalled.Jul 13 2018, 1:40 PM
RobH added a parent task: Unknown Object (Task).Aug 21 2018, 6:07 PM

Change 461228 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] cr1/2-ulsfo -> cr3/4-ulsfo renaming

https://gerrit.wikimedia.org/r/461228

Change 461233 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Puppet, rename all instances of cr1/2-ulsfo to cr3/4

https://gerrit.wikimedia.org/r/461233

ayounsi updated the task description. (Show Details)Sep 19 2018, 4:53 PM
ayounsi updated the task description. (Show Details)Sep 19 2018, 5:53 PM
ayounsi updated the task description. (Show Details)Sep 19 2018, 8:06 PM
ayounsi updated the task description. (Show Details)
ayounsi updated the task description. (Show Details)Sep 24 2018, 10:55 PM

Change 461228 merged by Ayounsi:
[operations/dns@master] cr1/2-ulsfo -> cr3/4-ulsfo renaming

https://gerrit.wikimedia.org/r/461228

ayounsi updated the task description. (Show Details)Sep 26 2018, 10:01 PM

Change 461233 merged by Ayounsi:
[operations/puppet@production] Puppet, rename all instances of cr1/2-ulsfo to cr3/4

https://gerrit.wikimedia.org/r/461233

ayounsi updated the task description. (Show Details)Sep 26 2018, 11:28 PM
ayounsi updated the task description. (Show Details)Sep 27 2018, 12:24 AM
RobH moved this task from Backlog to Racking Tasks on the ops-ulsfo board.Oct 3 2018, 3:22 PM
ayounsi closed this task as Resolved.Oct 3 2018, 4:33 PM
ayounsi updated the task description. (Show Details)