Page MenuHomePhabricator

Upgrade webperf hosts to Bullseye
Closed, ResolvedPublic

Description

@dpifke upgraded the webperf hosts in deployment-prep to Bullseye, the next step it is to migrate the production instances:

Upgrade procedure for Coal hosts:

Upgrade procedure for Arclamp hosts:

Event Timeline

Change 777787 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add webperf[12]00[34] to DHCP config/site.pp

https://gerrit.wikimedia.org/r/777787

Change 777787 merged by Muehlenhoff:

[operations/puppet@production] Add webperf[12]00[34] to DHCP config/site.pp

https://gerrit.wikimedia.org/r/777787

Change 785115 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Apply role::webperf::processors_and_site to webperf1003/2003

https://gerrit.wikimedia.org/r/785115

Change 785117 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Extend Ferm rules for new webperf hosts

https://gerrit.wikimedia.org/r/785117

Change 785118 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove obsolete webperf hosts

https://gerrit.wikimedia.org/r/785118

Change 785117 merged by Muehlenhoff:

[operations/puppet@production] Extend Ferm rules for new webperf hosts

https://gerrit.wikimedia.org/r/785117

Mentioned in SAL (#wikimedia-operations) [2022-05-03T13:47:55Z] <moritzm> stopped/maske coal/navtiming on webperf1001/webperf2001 T305460

Change 785115 merged by Muehlenhoff:

[operations/puppet@production] Apply role::webperf::processors_and_site to webperf1003/2003

https://gerrit.wikimedia.org/r/785115

Change 789084 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove webperf1001/webperf2001 from Kafka Ferm config

https://gerrit.wikimedia.org/r/789084

Change 789084 merged by Muehlenhoff:

[operations/puppet@production] Remove webperf1001/webperf2001 from Kafka Ferm config

https://gerrit.wikimedia.org/r/789084

Change 793403 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove webperf1001/webperf2001

https://gerrit.wikimedia.org/r/793403

Change 793403 merged by Muehlenhoff:

[operations/puppet@production] Remove webperf1001/webperf2001

https://gerrit.wikimedia.org/r/793403

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: webperf2001.codfw.wmnet

  • webperf2001.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: webperf1001.eqiad.wmnet

  • webperf1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
MoritzMuehlenhoff updated the task description. (Show Details)

Change 802750 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Point active arclamp host to webperf1004 and update dsh groups

https://gerrit.wikimedia.org/r/802750

Change 802752 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] arclamp: add rsync config to migrate Xenon data

https://gerrit.wikimedia.org/r/802752

dpifke updated the task description. (Show Details)
dpifke updated the task description. (Show Details)

Change 802752 merged by Muehlenhoff:

[operations/puppet@production] arclamp: add rsync config to migrate Xenon data

https://gerrit.wikimedia.org/r/802752

Mentioned in SAL (#wikimedia-operations) [2022-06-07T14:45:42Z] <moritzm> adding additional disk for /srv to webperf2004 T305460

Mentioned in SAL (#wikimedia-operations) [2022-06-08T07:50:43Z] <moritzm> adding additional disk for /srv to webperf1004 T305460

Mentioned in SAL (#wikimedia-operations) [2022-06-09T14:09:31Z] <moritzm> masking Excimer/Arclamp services/timers on webperf1002/2002 T305460

Change 802750 merged by Muehlenhoff:

[operations/puppet@production] Point active arclamp host to webperf1004 and update dsh groups

https://gerrit.wikimedia.org/r/802750

Change 804333 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] ALso point codfw to the new node

https://gerrit.wikimedia.org/r/804333

Change 804333 merged by Muehlenhoff:

[operations/puppet@production] ALso point codfw to the new node

https://gerrit.wikimedia.org/r/804333

Change 804334 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove webperf1002/webperf2002 from Kafka firewall rules

https://gerrit.wikimedia.org/r/804334

Change 804339 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove rsync config only needed for stretch->bullseye migration

https://gerrit.wikimedia.org/r/804339

Change 804340 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] coal: Remove support for pre Bullseye installs

https://gerrit.wikimedia.org/r/804340

Change 804341 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Switch old Stretch arclamp nodes to role::insetup until eventual decom

https://gerrit.wikimedia.org/r/804341

Change 804339 merged by Muehlenhoff:

[operations/puppet@production] Remove rsync config only needed for stretch->bullseye migration

https://gerrit.wikimedia.org/r/804339

Change 804341 merged by Muehlenhoff:

[operations/puppet@production] Switch old Stretch arclamp nodes to role::insetup until eventual decom

https://gerrit.wikimedia.org/r/804341

Change 804334 merged by Muehlenhoff:

[operations/puppet@production] Remove webperf1002/webperf2002 from Kafka firewall rules

https://gerrit.wikimedia.org/r/804334

Change 804340 merged by Muehlenhoff:

[operations/puppet@production] coal: Remove support for pre Bullseye installs

https://gerrit.wikimedia.org/r/804340

Change 785118 abandoned by Muehlenhoff:

[operations/puppet@production] Remove obsolete webperf hosts

Reason:

A variant of this patch was merged

https://gerrit.wikimedia.org/r/785118

Change 808192 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove old Arclamp buster VMs

https://gerrit.wikimedia.org/r/808192

Change 808192 merged by Muehlenhoff:

[operations/puppet@production] Remove old Arclamp buster VMs

https://gerrit.wikimedia.org/r/808192

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: webperf2002.codfw.wmnet

  • webperf2002.codfw.wmnet (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Host steps raised exception: Cannot find cluster row_A (expected ('ganeti01.svc.eqiad.wmnet', 'ganeti01.svc.codfw.wmnet', 'ganeti01.svc.esams.wmnet', 'ganeti01.svc.ulsfo.wmnet', 'ganeti01.svc.eqsin.wmnet', 'ganeti-test01.svc.codfw.wmnet', 'ganeti01.svc.drmrs.wmnet', 'ganeti02.svc.drmrs.wmnet')).

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by volans@cumin2002 for hosts: webperf2002.codfw.wmnet

  • webperf2002.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: webperf1002.eqiad.wmnet

  • webperf1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox