Page MenuHomePhabricator

Main Tracking Task for ESAMS Migration to KNAMS
Closed, ResolvedPublicRequest

Description

https://wikitech.wikimedia.org/wiki/SRE/business_case/Network_-_move_from_esams_to_knams

  • Sign off on Revised Interxion Contract T315937
  • Submit travel request by May
  • PDU order - need onsite by June 30 T323414
  • Smart hands migration of router and x-connects (Leaseweb/Lumen, Relined, GTT, LibertyGlobal, Nikhef, T-Mobile, Datahop) to new cabinets, test connection - complete before July 31 T337997
  • OOB connection with ATOM86 T333604
  • Order of network switches, servers, cables, and expendables - need onsite by July 31 T320548 T327438 T326586 T331460 T336871 T336869 T338422
  • Submit termination notice for Iron Mountain, Leaseweb, Relined - submit by July 31 (SENT TO LEGAL)
  • Migrate esams circuits to knams - Lumen, Telia, EuNetworks, AMS-IX, FiberRing T339943
  • Move mx router from esams to knams - complete in August
  • Physical install and cabling of hardware at knams - complete in August
  • Configure and test network and OOB connections - complete in August
  • Configure servers and route traffic from esams to knams - complete in August
  • Send announcement in August
  • Decommission esams - drive wiping and shipping out hardware - complete before November T341528

Details

SubjectRepoBranchLines +/-
operations/software/pywmflibmaster+1 -1
operations/dnsmaster+1 -2
operations/puppetproduction+2 -5
operations/dnsmaster+3 -2
operations/homer/publicmaster+2 -0
operations/mediawiki-configmaster+7 -5
operations/mediawiki-configmaster+4 -4
operations/dnsmaster+0 -2
operations/dnsmaster+9 -12
operations/homer/publicmaster+3 -0
operations/dnsmaster+16 -0
operations/dnsmaster+2 -2
operations/puppetproduction+27 -89
operations/dnsmaster+5 -129
operations/dnsmaster+5 -4
operations/homer/publicmaster+0 -232
operations/homer/publicmaster+2 -2
operations/homer/publicmaster+11 -32
operations/puppetproduction+0 -4
operations/puppetproduction+3 -1
operations/puppetproduction+5 -1
operations/homer/publicmaster+2 -0
operations/puppetproduction+1 -1
operations/homer/publicmaster+1 -1
operations/dnsmaster+1 -1
operations/homer/publicmaster+8 -0
operations/puppetproduction+23 -0
operations/puppetproduction+4 -4
operations/puppetproduction+37 -36
operations/puppetproduction+5 -1
operations/puppetproduction+0 -5
operations/homer/publicmaster+11 -9
operations/homer/publicmaster+5 -19
operations/homer/publicmaster+2 -2
operations/puppetproduction+4 -4
operations/puppetproduction+2 -0
operations/puppetproduction+1 -1
operations/homer/publicmaster+38 -50
operations/dnsmaster+0 -12
operations/homer/publicmaster+0 -2
operations/puppetproduction+3 -2
operations/puppetproduction+2 -2
operations/dnsmaster+2 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedRequestNone
ResolvedRobH
ResolvedRobH
ResolvedVolans
ResolvedPapaul
Resolvedcmooney
Resolvedcmooney
ResolvedRobH
Resolvedwiki_willy
Resolvedssingh

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 949023 merged by Ayounsi:

[operations/puppet@production] Add new esams switches to icinga hostgroups

https://gerrit.wikimedia.org/r/949023

Change 949035 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add new-esams network infra to monitoring

https://gerrit.wikimedia.org/r/949035

Change 949035 merged by Ayounsi:

[operations/puppet@production] Add new-esams network infra to monitoring

https://gerrit.wikimedia.org/r/949035

Change 949072 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Rancid: esams migration

https://gerrit.wikimedia.org/r/949072

Change 949072 merged by Ayounsi:

[operations/puppet@production] Rancid: esams migration

https://gerrit.wikimedia.org/r/949072

Change 949100 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] devices: add anycast_ and lvs_neigbhors for esams (bw27/by27)

https://gerrit.wikimedia.org/r/949100

Change 949113 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] esams/ntp: point to dns3003

https://gerrit.wikimedia.org/r/949113

Change 948581 abandoned by Ssingh:

[operations/puppet@production] hiera: enable single backend on esams and switch to F4-U hardware config

Reason:

no longer required

https://gerrit.wikimedia.org/r/948581

Change 949100 merged by Ssingh:

[operations/homer/public@master] devices: add anycast_ and lvs_neigbhors for esams (bw27/by27)

https://gerrit.wikimedia.org/r/949100

Mentioned in SAL (#wikimedia-operations) [2023-08-16T13:46:44Z] <sukhe> running homer on asw1-b*27-esams* for CR 949100: T329219

Change 949113 merged by Ssingh:

[operations/dns@master] esams/ntp: point to dns3003

https://gerrit.wikimedia.org/r/949113

Change 949525 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] common: update ntp_servers with dns300[34]

https://gerrit.wikimedia.org/r/949525

Change 949529 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: LVS: update tagged_subnets for esams

https://gerrit.wikimedia.org/r/949529

Change 949531 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] P:pybal: update bgp-peer-address for asw1-b*27-esams

https://gerrit.wikimedia.org/r/949531

Change 949533 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Update netflow collector IP

https://gerrit.wikimedia.org/r/949533

Change 949534 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Update esams netflow collector

https://gerrit.wikimedia.org/r/949534

Change 949533 merged by Ayounsi:

[operations/homer/public@master] Update netflow collector IP

https://gerrit.wikimedia.org/r/949533

Change 949542 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] hiera: decommission dns3001 and dns3002

https://gerrit.wikimedia.org/r/949542

Change 949534 merged by Ayounsi:

[operations/puppet@production] Update esams netflow collector

https://gerrit.wikimedia.org/r/949534

Change 949525 merged by Ssingh:

[operations/homer/public@master] common: update ntp_servers with dns300[34]

https://gerrit.wikimedia.org/r/949525

Change 949529 merged by Ssingh:

[operations/puppet@production] hiera: LVS: update tagged_subnets for esams

https://gerrit.wikimedia.org/r/949529

Change 949531 merged by Ssingh:

[operations/puppet@production] P:pybal: update bgp-peer-address for asw1-b*27-esams

https://gerrit.wikimedia.org/r/949531

Change 949551 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] Remove esams hosts prior to knams migration

https://gerrit.wikimedia.org/r/949551

Change 949542 merged by Fabfur:

[operations/puppet@production] hiera: decommission dns3001 and dns3002

https://gerrit.wikimedia.org/r/949542

RobH reopened subtask Unknown Object (Task) as Open.Aug 16 2023, 4:12 PM

cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: dns3001.wikimedia.org

  • dns3001.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook, run it manually

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: dns3002.wikimedia.org

  • dns3002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by brett@cumin2002 for hosts: cp[3050-3053].esams.wmnet

  • cp3050.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3051.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3052.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3053.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by brett@cumin2002 for hosts: cp[3054-3057].esams.wmnet

  • cp3054.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3055.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3056.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cp3057.esams.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 949621 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] esams: update dhcp_server

https://gerrit.wikimedia.org/r/949621

Change 949623 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] old esams cleanup

https://gerrit.wikimedia.org/r/949623

Change 949623 merged by jenkins-bot:

[operations/homer/public@master] old esams cleanup

https://gerrit.wikimedia.org/r/949623

Change 949621 merged by jenkins-bot:

[operations/homer/public@master] esams: update dhcp_server

https://gerrit.wikimedia.org/r/949621

Change 949853 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Homer: remove all mentions of old esams

https://gerrit.wikimedia.org/r/949853

Change 949930 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] knams migration: remove references to old esams

https://gerrit.wikimedia.org/r/949930

Change 949934 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Remove all mentions of old-esams, replace with new esams

https://gerrit.wikimedia.org/r/949934

Change 949853 merged by jenkins-bot:

[operations/homer/public@master] Homer: remove all mentions of old esams

https://gerrit.wikimedia.org/r/949853

Change 949938 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] 10.in-addr.arpa: remove include for 0.20.10.in-addr.arpa

https://gerrit.wikimedia.org/r/949938

Change 949938 merged by Ssingh:

[operations/dns@master] 10.in-addr.arpa: remove include for 0.20.10.in-addr.arpa

https://gerrit.wikimedia.org/r/949938

Change 949972 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] 10.in-addr.arpa: remove include for netbox/0.21.10.in-addr.arpa

https://gerrit.wikimedia.org/r/949972

Change 949975 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] Remove PTRs for 91.198.174.0/24 and 2620:0:862::/48

https://gerrit.wikimedia.org/r/949975

Change 949975 merged by Ayounsi:

[operations/dns@master] Remove PTRs for 91.198.174.0/24 and 2620:0:862::/48

https://gerrit.wikimedia.org/r/949975

Change 949934 merged by Ayounsi:

[operations/puppet@production] Remove all mentions of old-esams, replace with new esams

https://gerrit.wikimedia.org/r/949934

Change 949972 abandoned by Ssingh:

[operations/dns@master] 10.in-addr.arpa: remove include for netbox/0.21.10.in-addr.arpa

Reason:

no longer required

https://gerrit.wikimedia.org/r/949972

RobH closed subtask Unknown Object (Task) as Resolved.Aug 17 2023, 6:28 PM

Change 950045 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Reverse includes for new esams range

https://gerrit.wikimedia.org/r/950045

Change 950045 merged by Cathal Mooney:

[operations/dns@master] Reverse includes for new esams ranges

https://gerrit.wikimedia.org/r/950045

Change 950176 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] Repool esams after knams migration (merge on Monday Aug 21)

https://gerrit.wikimedia.org/r/950176

Icinga downtime and Alertmanager silence (ID=7e93d2ff-5842-45cb-8ae6-d620952951e6) set by cmooney@cumin1001 for 2:00:00 on 16 host(s) and their services with reason: Downtime esams hosts prior to cr1-esams reboot

asw1-bw27-esams,asw1-bw27-esams.mgmt,asw1-by27-esams,asw1-by27-esams.mgmt,cr2-drmrs,cr2-eqiad,cr[1-2]-esams,cr2-esams.mgmt,mr1-esams,mr1-esams.oob,ps1-oe[14-16]-esams,re0.cr1-esams.mgmt,scs-by27-esams

Change 951186 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Announce Anycast prefixes from esams

https://gerrit.wikimedia.org/r/951186

Change 951186 merged by jenkins-bot:

[operations/homer/public@master] Announce Anycast prefixes from esams

https://gerrit.wikimedia.org/r/951186

Change 949930 merged by Ssingh:

[operations/dns@master] knams migration: remove references to old esams

https://gerrit.wikimedia.org/r/949930

Change 950176 merged by Ssingh:

[operations/dns@master] Repool esams after knams migration (merge on Monday Aug 21)

https://gerrit.wikimedia.org/r/950176

Change 951508 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/mediawiki-config@master] wmf-config: update new esams IP ranges

https://gerrit.wikimedia.org/r/951508

Change 951508 merged by jenkins-bot:

[operations/mediawiki-config@master] wmf-config: update new esams IP ranges

https://gerrit.wikimedia.org/r/951508

Mentioned in SAL (#wikimedia-operations) [2023-08-22T14:30:34Z] <taavi@deploy1002> Started scap: Backport for [[gerrit:951508|wmf-config: update new esams IP ranges (T329219)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-22T14:32:07Z] <taavi@deploy1002> taavi and sukhe: Backport for [[gerrit:951508|wmf-config: update new esams IP ranges (T329219)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-22T14:40:25Z] <taavi@deploy1002> Finished scap: Backport for [[gerrit:951508|wmf-config: update new esams IP ranges (T329219)]] (duration: 09m 50s)

Change 951591 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/mediawiki-config@master] wmf-config: remove public subnets from reverse-proxy.php

https://gerrit.wikimedia.org/r/951591

Change 951591 merged by jenkins-bot:

[operations/mediawiki-config@master] wmf-config: remove public subnets from reverse-proxy.php

https://gerrit.wikimedia.org/r/951591

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:43:34Z] <taavi@deploy1002> Started scap: Backport for [[gerrit:951591|wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:46:50Z] <taavi@deploy1002> sukhe and taavi: Backport for [[gerrit:951591|wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:55:02Z] <taavi@deploy1002> Finished scap: Backport for [[gerrit:951591|wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219)]] (duration: 11m 28s)

Change 954965 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] asw1-b*27-esams: add durum300[34]

https://gerrit.wikimedia.org/r/954965

Change 954965 merged by Ssingh:

[operations/homer/public@master] asw1-b*27-esams: add durum300[34]

https://gerrit.wikimedia.org/r/954965

RobH closed subtask Unknown Object (Task) as Resolved.Sep 7 2023, 7:21 PM
Papaul closed subtask Unknown Object (Task) as Resolved.Sep 8 2023, 1:29 PM

Change 955943 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] 27.35.198.in-addr.arpa: update PTR for 198.35.27.27

https://gerrit.wikimedia.org/r/955943

Change 955961 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove references to nsa.wikimedia.org

https://gerrit.wikimedia.org/r/955961

Change 955943 merged by Ssingh:

[operations/dns@master] 27.35.198.in-addr.arpa: update PTR for 198.35.27.27

https://gerrit.wikimedia.org/r/955943

Change 955962 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] wikimedia.org: remove nsa.wikimedia.org

https://gerrit.wikimedia.org/r/955962

Change 955961 merged by Ssingh:

[operations/puppet@production] hiera: remove references to nsa.wikimedia.org

https://gerrit.wikimedia.org/r/955961

Change 955962 merged by Ssingh:

[operations/dns@master] wikimedia.org: remove nsa.wikimedia.org

https://gerrit.wikimedia.org/r/955962

Mentioned in SAL (#wikimedia-operations) [2023-10-12T13:43:40Z] <sukhe> remove old ns2 IP 91.198.174.239/32 from /e/n/i on A:dns-rec: T329219

ayounsi closed subtask Unknown Object (Task) as Resolved.Oct 30 2023, 4:35 PM

Change 976210 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/software/pywmflib@master] constants: update ns2 IP address

https://gerrit.wikimedia.org/r/976210

Change 976210 merged by Volans:

[operations/software/pywmflib@master] constants: update ns2 IP address

https://gerrit.wikimedia.org/r/976210

RobH closed subtask Unknown Object (Task) as Resolved.Nov 27 2023, 8:51 PM
wiki_willy closed subtask Unknown Object (Task) as Resolved.Dec 1 2023, 10:04 PM
Papaul reopened subtask Unknown Object (Task) as Open.Jan 22 2024, 11:53 PM
Papaul closed subtask Unknown Object (Task) as Resolved.Jan 23 2024, 2:19 AM
RobH closed subtask Restricted Task as Resolved.Feb 20 2024, 3:10 PM
RobH removed wiki_willy as the assignee of this task.
RobH closed subtask Unknown Object (Task) as Resolved.

Only two sub-tasks open, T350621 and T342239 which are both being taken care of on their own tasks. As such a master tracking task is no longer required and is being resolved.