Page MenuHomePhabricator

Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f6-eqiad
Closed, ResolvedPublic

Description

2024-06-13, 15:00 UTC

on this rack: https://netbox.wikimedia.org/dcim/racks/94/

  • an-worker1169
  • an-worker1170
  • an-worker1171
  • es1039 - es7 replica
  • ms-be1080

Teams involved: Data Platform, Data Persistence

Expected outage: 15-30 minutes

Please use the below sheet to detail any actions that are required in advance of the work:

https://docs.google.com/spreadsheets/d/1pLPpzGBmdExXxQ_0_eGXpO0VlUU5oPKZy-_KViMSwuM

Details

Other Assignee
ABran-WMF

Event Timeline

ABran-WMF updated the task description. (Show Details)
ABran-WMF renamed this task from T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f6-eqiad to Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f6-eqiad .May 27 2024, 1:15 PM

Again, from a swift POV, this should just be a case of checking the cluster is happy afterwards.

cmooney triaged this task as Medium priority.
cmooney updated the task description. (Show Details)
cmooney added a subscriber: MatthewVernon.

Icinga downtime and Alertmanager silence (ID=94b81d4d-316b-4c68-b4a9-a2d07057d180) set by cmooney@cumin1002 for 2:40:00 on 1 host(s) and their services with reason: prep JunOS upgrade lsw1-f6-eqiad

lsw1-f6-eqiad.mgmt

Mentioned in SAL (#wikimedia-operations) [2024-06-13T14:50:35Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 depool ahead of T365983', diff saved to https://phabricator.wikimedia.org/P64861 and previous config saved to /var/cache/conftool/dbconfig/20240613-145035-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-13T14:50:56Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: T365983

Mentioned in SAL (#wikimedia-operations) [2024-06-13T14:51:09Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: T365983

Icinga downtime and Alertmanager silence (ID=891c00a3-b649-4659-b39f-5ad6b01367a9) set by cmooney@cumin1002 for 0:40:00 on 4 host(s) and their services with reason: JunOS upgrade lsw1-f6-eqiad

lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt

Icinga downtime and Alertmanager silence (ID=5a6a58c5-4681-4aea-8e80-e8ba2c613022) set by cmooney@cumin1002 for 0:35:00 on 5 host(s) and their services with reason: JunOS upgrade lsw1-f6-eqiad

an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-06-13T15:04:47Z] <topranks> rebooting lsw1-f6-codfw to upgrade JunOS on switch T365983

Switch has reloaded on the new version, all looks good at first glance.

cmooney@lsw1-f6-eqiad> show interfaces descriptions 
Interface       Admin Link Description
xe-0/0/1        up    up   ms-be1080 {#230304500070}
ge-0/0/4        up    up   es1039 {#9101915}
xe-0/0/20       up    up   an-worker1169 {#20231126}
xe-0/0/22       up    up   an-worker1170 {#20231127}
xe-0/0/24       up    up   an-worker1171 {#20231128}
et-0/0/48                  DISABLED
et-0/0/49                  DISABLED
et-0/0/50                  DISABLED
et-0/0/51                  DISABLED
et-0/0/52                  DISABLED
et-0/0/53                  DISABLED
et-0/0/54       up    up   Core: ssw1-e1-eqiad:et-0/0/13 {#202308006}
et-0/0/55       up    up   Core: ssw1-f1-eqiad:et-0/0/13 {#202308014}
em1             down  down DISABLED
irb.1057        up    up   private1-f6-eqiad
irb.1058        up    up   analytics1-f6-eqiad

{master:0}
cmooney@lsw1-f6-eqiad> show ethernet-switching table    

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 5 entries, 5 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                SVLBNH/      Active
   name                address             flags    interface              VENH Index   source
   analytics1-f6-eqiad 14:23:f2:56:97:90   D        xe-0/0/20.0          
   analytics1-f6-eqiad 14:23:f2:56:98:30   D        xe-0/0/24.0          
   analytics1-f6-eqiad 14:23:f2:56:d0:10   D        xe-0/0/22.0          
   private1-f6-eqiad   00:62:0b:75:44:30   D        xe-0/0/1.0           
   private1-f6-eqiad   a8:3c:a5:00:53:a6   D        ge-0/0/4.0

Mentioned in SAL (#wikimedia-operations) [2024-06-13T15:23:00Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64866 and previous config saved to /var/cache/conftool/dbconfig/20240613-152300-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-13T15:38:06Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64871 and previous config saved to /var/cache/conftool/dbconfig/20240613-153805-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-13T15:53:11Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64874 and previous config saved to /var/cache/conftool/dbconfig/20240613-155310-arnaudb.json

Thanks for checking things, all stable on our side I will close the task now.

Mentioned in SAL (#wikimedia-operations) [2024-06-13T16:08:16Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64878 and previous config saved to /var/cache/conftool/dbconfig/20240613-160816-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-13T16:23:22Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64883 and previous config saved to /var/cache/conftool/dbconfig/20240613-162321-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-18T15:20:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-18T15:35:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-18T15:50:44Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-18T16:05:48Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-18T16:20:54Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json