Page MenuHomePhabricator

Rack/cable/configure ulsfo MX204
Closed, ResolvedPublic

Description

Aiming at starting the work on Wednesday June 27th, 11am local time (6pm UTC), 5h estimated maintenance.

EDIT: Blocked by T196030
EDIT2: new plan due to DC move

  • Rack/power the routers in temp location
  • Upgrade cr3/4
  • Configure cr3/4
  • Disable transit/peering links on cr3/4
deactivate protocols bgp group IX4
deactivate protocols bgp group IX6
deactivate protocols bgp group Transit4
deactivate protocols bgp group Transit6
  • Prepare DNS/Icinga/Smokeping/Rancid CRs

https://gerrit.wikimedia.org/r/461228
https://gerrit.wikimedia.org/r/461233

  • Move cr3/4 to new racks
  • Connect cr3<->cr4 link

During maintenance window:

  • Depool ulsfo from serving traffic via dns
  • Downtime all ulsfo hosts in Icinga/LibreNMS (and devices linked to ulsfo hosts)
  • Shutdown cr1/2
  • Connect cr3/4<->asw2 link
  • Reconfigure asw2 for new router
set interfaces et-1/0/24 mtu 9192
set interfaces et-1/0/24 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members private1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members public1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members customer-1montgomery
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members sandbox1-ulsfo
set interfaces et-1/0/24 unit 0 family ethernet-switching vlan members XLink1

set interfaces et-2/0/24 mtu 9192
set interfaces et-2/0/24 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members private1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members public1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members customer-1montgomery
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members sandbox1-ulsfo
set interfaces et-2/0/24 unit 0 family ethernet-switching vlan members XLink1

delete interfaces ae1
delete interfaces ae2
delete interfaces xe-2/0/18
delete interfaces xe-2/0/19
delete interfaces xe-1/0/14
delete interfaces xe-1/0/15
delete interfaces interface-range infrastructure member xe-2/0/18
delete interfaces interface-range infrastructure member xe-2/0/19
set interfaces interface-range infrastructure member et-1/0/24
set interfaces interface-range infrastructure member et-2/0/24
  • Connect cr3/4 transport links

cr3:xe-0/1/1 Telia transport
cr4:xe-0/1/2 Zayo transport

  • Verify all sessions are up, no alarms, prefixes exchanged, ulsfo devices reachable
  • Connect/enable cr3/4 peering/transit

cr3:xe-0/1/0 Zayo Transit
cr3:xe-0/1/2 Telia transit
cr4:xe-0/1/0 NTT transit
cr4:xe-0/1/1 Equinix

activate protocols bgp group IX4
activate protocols bgp group IX6
activate protocols bgp group Transit4
activate protocols bgp group Transit6
  • Verify all sessions are up, no alarms, prefixes exchanged, ulsfo devices reachable
  • Merge pending CRs (DNS then puppet)
  • LibreNMS, delete cr3/4, rename cr1/2 to cr3/4 (to keep history)
  • Verify monitoring if happy
  • Re-pool ulsfo

After maintenance:

  • Rename neighbor's interfaces descriptions
  • Update racktables/Netbox
  • Wipe/unrack cr1/cr2

Event Timeline

ayounsi triaged this task as Medium priority.Mar 13 2018, 3:01 AM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi changed the task status from Open to Stalled.Mar 13 2018, 11:13 PM
ayounsi updated the task description. (Show Details)

Note this will involve a planned ulsfo site outage, with its traffic falling back to codfw. If things go well the outage should be brief, the 5h estimate above is worst-case with complications. We should avoid taking other significant risks while this is ongoing (esp anything re: codfw-vs-eqiad redundancy, or risks to eqsin).

one of the two routers is now temp racked (not enough rack studs to actually mount, its resting on top of the other servers) with temp power/mgmt leads run.

stole the mgmt connection for cr2-ulsfo, and plugged into the mx204

Change 430516 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] Add mgmt for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430516

Change 430516 merged by Ayounsi:
[operations/dns@master] Add mgmt for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430516

Change 430517 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] Add loopback IPs for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430517

Change 430517 merged by Ayounsi:
[operations/dns@master] Add loopback IPs for cr3/4-ulsfo

https://gerrit.wikimedia.org/r/430517

The ports mode combinations on the MX204 are so complex that Juniper wrote a webapp to validate plans: https://apps.juniper.net/home/port-checker/

RobH added a parent task: Unknown Object (Task).Aug 21 2018, 6:07 PM

Change 461228 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/dns@master] cr1/2-ulsfo -> cr3/4-ulsfo renaming

https://gerrit.wikimedia.org/r/461228

Change 461233 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Puppet, rename all instances of cr1/2-ulsfo to cr3/4

https://gerrit.wikimedia.org/r/461233

ayounsi updated the task description. (Show Details)

Change 461228 merged by Ayounsi:
[operations/dns@master] cr1/2-ulsfo -> cr3/4-ulsfo renaming

https://gerrit.wikimedia.org/r/461228

Change 461233 merged by Ayounsi:
[operations/puppet@production] Puppet, rename all instances of cr1/2-ulsfo to cr3/4

https://gerrit.wikimedia.org/r/461233

ayounsi updated the task description. (Show Details)