Page MenuHomePhabricator

Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e6-eqiad
Closed, ResolvedPublic

Description

2024-06-20 15:00 UTC

on this rack: https://netbox.wikimedia.org/dcim/racks/86/

  • an-worker1160
  • an-worker1161
  • an-worker1162
  • es1036 - es6 replica
  • ms-be1077

Teams involved: Data Platform, Data Persistence

Expected outage: 15-30 minutes

Please use the below sheet to detail any actions that are required in advance of the work:

https://docs.google.com/spreadsheets/d/1pLPpzGBmdExXxQ_0_eGXpO0VlUU5oPKZy-_KViMSwuM

Details

Other Assignee
MatthewVernon

Event Timeline

ABran-WMF updated the task description. (Show Details)
ABran-WMF updated Other Assignee, added: MatthewVernon; removed: ABran-WMF.
cmooney triaged this task as Medium priority.
cmooney updated the task description. (Show Details)

this task's scheduling is swapped with T365986

Mentioned in SAL (#wikimedia-operations) [2024-06-20T14:30:05Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987

Mentioned in SAL (#wikimedia-operations) [2024-06-20T14:30:29Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987

Mentioned in SAL (#wikimedia-operations) [2024-06-20T14:31:10Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json

Icinga downtime and Alertmanager silence (ID=a9f67fa1-2b7b-4d95-baeb-165066794820) set by cmooney@cumin1002 for 0:50:00 on 1 host(s) and their services with reason: prep JunOS upgrade lsw1-f6-eqiad

lsw1-f6-eqiad.mgmt

Icinga downtime and Alertmanager silence (ID=502400bd-26c4-4ef2-ba8d-f7e60f2e1fda) set by cmooney@cumin1002 for 0:50:00 on 1 host(s) and their services with reason: prep JunOS upgrade lsw1-f6-eqiad

lsw1-e6-eqiad.mgmt

Icinga downtime and Alertmanager silence (ID=a252ad13-6c0c-413b-ac6e-611240bae83b) set by cmooney@cumin1002 for 0:40:00 on 4 host(s) and their services with reason: JunOS upgrade lsw1-e6-eqiad

lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt

Icinga downtime and Alertmanager silence (ID=91fd4f9d-94ae-4b80-9210-cd0554343592) set by cmooney@cumin1002 for 0:40:00 on 5 host(s) and their services with reason: JunOS upgrade lsw1-e6-eqiad

an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-06-20T15:01:30Z] <topranks> rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987

Mentioned in SAL (#wikimedia-operations) [2024-06-20T15:18:22Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json

Upgrade complete, all looks good network side.

Mentioned in SAL (#wikimedia-operations) [2024-06-20T15:33:26Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-20T15:48:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-20T16:03:38Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-20T16:18:43Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-06-20T16:33:52Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json