Page MenuHomePhabricator

Upgrade Traffic hosts to bullseye
Closed, ResolvedPublic

Description

This task tracks the upgrade of the Traffic hosts to bullseye, affecting the services below (identified by their cumin aliases). There is no particular order but we will be doing the cp hosts first to resolve T319067.

Progress (order of execution):

HostDebian PackagesReimaging
cp
acmechief
ncredir
dns-rec
dns-auth
durum
Wikidough
lvs
lvs experimental
pybal-test
  • A:cp

Debian packages upgraded:

  • varnish_6.0.10-1wm2
  • trafficserver_9.1.3-1wm3
  • fifo-log-demux_0.6.3
  • file-read-backwards_2.0.0-3
  • prometheus-rdkafka-exporter_0.3
  • python-logstash_0.4.6-3
  • prometheus-varnishkafka-exporter_0.1-2
  • varnishkafka_1.1.0-2
  • libvmod-netmapper_1.9-2
  • libvmod-querysort_0.3
  • purged_0.19
  • libvmod-re2_1.5.3-3
  • varnish-modules_0.15.0-2
  • A:dns-auth
  • gdnsd_3.8.0-1~wmf2

eqiad

text

  • cp1075.eqiad.wmnet
  • cp1077.eqiad.wmnet
  • cp1079.eqiad.wmnet
  • cp1081.eqiad.wmnet
  • cp1083.eqiad.wmnet
  • cp1085.eqiad.wmnet
  • cp1087.eqiad.wmnet
  • cp1089.eqiad.wmnet

eqiad

upload

  • cp1076.eqiad.wmnet
  • cp1078.eqiad.wmnet
  • cp1080.eqiad.wmnet
  • cp1082.eqiad.wmnet
  • cp1084.eqiad.wmnet
  • cp1086.eqiad.wmnet
  • cp1088.eqiad.wmnet
  • cp1090.eqiad.wmnet

eqsin

text

  • cp5017.eqsin.wmnet
  • cp5018.eqsin.wmnet
  • cp5019.eqsin.wmnet
  • cp5020.eqsin.wmnet
  • cp5021.eqsin.wmnet
  • cp5022.eqsin.wmnet
  • cp5023.eqsin.wmnet
  • cp5024.eqsin.wmnet

eqsin

upload

  • cp5025.eqsin.wmnet
  • cp5026.eqsin.wmnet
  • cp5027.eqsin.wmnet
  • cp5028.eqsin.wmnet
  • cp5029.eqsin.wmnet
  • cp5030.eqsin.wmnet
  • cp5031.eqsin.wmnet

codfw

text

  • cp2027.codfw.wmnet
  • cp2029.codfw.wmnet
  • cp2031.codfw.wmnet
  • cp2033.codfw.wmnet
  • cp2035.codfw.wmnet
  • cp2037.codfw.wmnet
  • cp2039.codfw.wmnet

codfw

upload

  • cp2028.codfw.wmnet
  • cp2030.codfw.wmnet
  • cp2032.codfw.wmnet
  • cp2034.codfw.wmnet
  • cp2036.codfw.wmnet
  • cp2038.codfw.wmnet
  • cp2040.codfw.wmnet

esams

text

  • cp3050.esams.wmnet
  • cp3052.esams.wmnet
  • cp3054.esams.wmnet
  • cp3056.esams.wmnet
  • cp3058.esams.wmnet
  • cp3060.esams.wmnet
  • cp3062.esams.wmnet
  • cp3064.esams.wmnet

esams

upload

  • cp3051.esams.wmnet
  • cp3053.esams.wmnet
  • cp3055.esams.wmnet
  • cp3057.esams.wmnet
  • cp3059.esams.wmnet
  • cp3061.esams.wmnet
  • cp3063.esams.wmnet
  • cp3065.esams.wmnet

ulsfo

text

  • cp4037.ulsfo.wmnet
  • cp4038.ulsfo.wmnet
  • cp4039.ulsfo.wmnet
  • cp4040.ulsfo.wmnet
  • cp4041.ulsfo.wmnet
  • cp4042.ulsfo.wmnet
  • cp4043.ulsfo.wmnet
  • cp4044.ulsfo.wmnet

ulsfo

upload

  • cp4045.ulsfo.wmnet
  • cp4046.ulsfo.wmnet
  • cp4047.ulsfo.wmnet
  • cp4048.ulsfo.wmnet
  • cp4049.ulsfo.wmnet
  • cp4050.ulsfo.wmnet
  • cp4051.ulsfo.wmnet
  • cp4052.ulsfo.wmnet

drmrs

text

  • cp6009.drmrs.wmnet
  • cp6010.drmrs.wmnet
  • cp6011.drmrs.wmnet
  • cp6012.drmrs.wmnet
  • cp6013.drmrs.wmnet
  • cp6014.drmrs.wmnet
  • cp6015.drmrs.wmnet
  • cp6016.drmrs.wmnet

drmrs

upload

  • cp6001.drmrs.wmnet
  • cp6002.drmrs.wmnet
  • cp6003.drmrs.wmnet
  • cp6004.drmrs.wmnet
  • cp6005.drmrs.wmnet
  • cp6006.drmrs.wmnet
  • cp6007.drmrs.wmnet
  • cp6008.drmrs.wmnet
  • A:dns-auth
  • A:dns-rec
  • dns1001.wikimedia.org
  • dns1002.wikimedia.org
  • dns2001.wikimedia.org
  • dns2002.wikimedia.org
  • dns3001.wikimedia.org
  • dns3002.wikimedia.org
  • dns4003.wikimedia.org
  • dns4004.wikimedia.org
  • dns5003.wikimedia.org
  • dns5004.wikimedia.org
  • dns6001.wikimedia.org
  • dns6002.wikimedia.org
  • A:acmechief
  • acmechief1001.eqiad.wmnet
  • acmechief2001.codfw.wmnet
  • acmechief-test1001.eqiad.wmnet
  • acmechief-test2001.codfw.wmnet

Debian packages upgraded:

  • acme-chief_0.36-1
  • A:ncredir
  • ncredir1001.eqiad.wmnet
  • ncredir1002.eqiad.wmnet
  • ncredir2001.codfw.wmnet
  • ncredir2002.codfw.wmnet
  • ncredir3001.esams.wmnet
  • ncredir3002.esams.wmnet
  • ncredir4001.ulsfo.wmnet
  • ncredir4002.ulsfo.wmnet
  • ncredir5001.eqsin.wmnet
  • ncredir5002.eqsin.wmnet
  • ncredir6001.drmrs.wmnet
  • ncredir6002.drmrs.wmnet
  • A:lvs
  • A:durum
  • durum1001.eqiad.wmnet
  • durum1002.eqiad.wmnet
  • durum2001.codfw.wmnet
  • durum2002.codfw.wmnet
  • durum3001.esams.wmnet
  • durum3002.esams.wmnet
  • durum4001.ulsfo.wmnet
  • durum4002.ulsfo.wmnet
  • durum5001.eqsin.wmnet
  • durum5002.eqsin.wmnet
  • durum6001.drmrs.wmnet
  • durum6002.drmrs.wmnet
  • A:wikidough
  • doh1001.wikimedia.org
  • doh1002.wikimedia.org
  • doh2001.wikimedia.org
  • doh2002.wikimedia.org
  • doh3001.wikimedia.org
  • doh3002.wikimedia.org
  • doh4001.wikimedia.org
  • doh4002.wikimedia.org
  • doh5001.wikimedia.org
  • doh5002.wikimedia.org
  • doh6001.wikimedia.org
  • doh6002.wikimedia.org
  • A:lvs
  • pybal_1.15.10+deb11u1

iDRAC firmware should be 6.10.00.00 [for bullseye installer]
NIC firmware should be 21.85.21.92 [for bullseye installer]

We should upgrade the iDRAC as well in this round.

HostiDRAC FirmwareNIC Firmware
lvs1017.eqiad.wmnet6.10.00.0021.85.21.91
lvs1018.eqiad.wmnet6.10.00.0021.85.21.91
lvs1019.eqiad.wmnet6.10.00.0021.85.21.91
lvs1020.eqiad.wmnet6.10.00.0021.85.21.91
lvs2007.codfw.wmnet6.10.00.0021.40.16.60
lvs2008.codfw.wmnet6.10.00.0021.40.16.60
lvs2009.codfw.wmnet6.10.00.0021.40.25.31
lvs2010.codfw.wmnet6.10.00.0021.85.21.92
lvs3005.esams.wmnet6.10.00.0021.40.22.20
lvs3006.esams.wmnet6.10.00.0021.40.22.20
lvs3007.esams.wmnet6.10.00.0021.40.22.20
lvs4008.ulsfo.wmnet6.10.00.0021.85.21.92
lvs4009.ulsfo.wmnet6.10.00.0021.85.21.92
lvs4010.ulsfo.wmnet6.10.00.0021.85.21.92
lvs5004.eqsin.wmnet5.10.30.0021.85.21.92
lvs5005.eqsin.wmnet5.10.30.0021.85.21.92
lvs5006.eqsin.wmnet5.10.30.0021.85.21.92
lvs6001.drmrs.wmnet6.10.00.0021.85.21.92
lvs6002.drmrs.wmnet6.10.00.0021.85.21.92
lvs6003.drmrs.wmnet6.10.00.0021.85.21.92
  • lvs1017.eqiad.wmnet
  • lvs1018.eqiad.wmnet
  • lvs1019.eqiad.wmnet
  • lvs1020.eqiad.wmnet
  • lvs2007.codfw.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs2008.codfw.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs2009.codfw.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs2010.codfw.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs3005.esams.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs3006.esams.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs3007.esams.wmnet [definitely needs NIC firmware upgrade, firmware was upgraded]
  • lvs4008.ulsfo.wmnet
  • lvs4009.ulsfo.wmnet
  • lvs4010.ulsfo.wmnet
  • lvs5004.eqsin.wmnet
  • lvs5005.eqsin.wmnet
  • lvs5006.eqsin.wmnet
  • lvs6001.drmrs.wmnet
  • lvs6002.drmrs.wmnet
  • lvs6003.drmrs.wmnet

lvs (experimental, L4LB):

  • lvs1013.eqiad.wmnet
  • lvs1014.eqiad.wmnet
  • lvs1015.eqiad.wmnet
  • lvs1016.eqiad.wmnet

pybal-test:

  • pybal-test2003.codfw.wmnet

This is meant to be an umbrella task for all changes that will be part of this upgrade, such as the Debian packaging, Puppet changes, and the related testing include reimaging.

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -1
operations/puppetproduction+6 -31
operations/puppetproduction+10 -48
operations/dnsmaster+2 -0
operations/puppetproduction+0 -2
operations/puppetproduction+15 -13
operations/puppetproduction+0 -2
operations/puppetproduction+16 -14
operations/puppetproduction+0 -2
operations/puppetproduction+15 -13
operations/puppetproduction+4 -40
operations/puppetproduction+14 -14
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+12 -11
operations/puppetproduction+0 -1
operations/puppetproduction+12 -11
operations/puppetproduction+0 -1
operations/puppetproduction+12 -11
operations/puppetproduction+11 -11
operations/puppetproduction+2 -11
operations/puppetproduction+2 -2
operations/puppetproduction+2 -18
operations/puppetproduction+1 -11
operations/puppetproduction+2 -2
operations/puppetproduction+8 -3
operations/puppetproduction+8 -3
operations/puppetproduction+1 -11
operations/puppetproduction+2 -2
operations/puppetproduction+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+6 -0
operations/puppetproduction+4 -1
operations/puppetproduction+4 -1
operations/puppetproduction+4 -1
operations/puppetproduction+4 -1
operations/puppetproduction+2 -2
operations/puppetproduction+5 -1
operations/puppetproduction+1 -0
operations/puppetproduction+14 -13
operations/puppetproduction+3 -7
operations/puppetproduction+8 -4
operations/puppetproduction+1 -1
operations/puppetproduction+15 -6
operations/puppetproduction+14 -2
operations/puppetproduction+12 -2
operations/puppetproduction+4 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -7
operations/puppetproduction+7 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+4 -4
operations/puppetproduction+4 -2
operations/puppetproduction+1 -0
operations/puppetproduction+0 -2
operations/puppetproduction+4 -6
operations/puppetproduction+15 -11
operations/homer/publicmaster+6 -1
operations/dnsmaster+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+1 -4
operations/puppetproduction+4 -1
operations/puppetproduction+3 -2
operations/puppetproduction+1 -1
operations/puppetproduction+21 -0
operations/puppetproduction+4 -1
operations/puppetproduction+0 -2
operations/puppetproduction+9 -1
operations/debs/gdnsdmaster+613 -0
integration/configmaster+1 -0
operations/puppetproduction+0 -1
operations/software/varnish/libvmod-querysortmain+10 -1
operations/software/acme-chiefdebian+9 -16
operations/software/acme-chiefdebian+2 -2
operations/software/acme-chiefdebian+24 -79
operations/software/acme-chiefmaster+2 -2
operations/software/acme-chiefmaster+24 -79
integration/configmaster+1 -1
integration/configmaster+8 -2
operations/software/acme-chiefdebian+12 -17
integration/configmaster+1 -0
operations/puppetproduction+5 -5
operations/puppetproduction+2 -0
operations/debs/varnish-modulesmaster+466 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -1
operations/puppetproduction+4 -24
operations/puppetproduction+6 -6
operations/software/varnish/libvmod-re2debian-6.0+93 -3
operations/software/purgedmaster+468 -30
operations/software/varnish/libvmod-re2debian-6.0+15 -5
operations/software/varnish/libvmod-netmapperdebian+18 -5
operations/software/varnish/varnishkafkadebian+13 -4
operations/puppetproduction+0 -25
operations/debs/varnish4debian-wmf+12 -3
operations/puppetproduction+13 -0
operations/debs/prometheus-varnishkafka-exportermaster+13 -3
operations/debs/file-read-backwardsdebian+13 -9
operations/software/prometheus-rdkafka-exportermaster+11 -3
operations/debs/python-logstashmaster+13 -16
operations/software/fifo-log-demuxmaster+13 -3
operations/puppetproduction+1 -0
operations/debs/trafficservermaster+13 -2
operations/puppetproduction+1 -1
operations/puppetproduction+11 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 908322 merged by BCornwall:

[operations/puppet@production] hiera: lvs2007: update iface names for bullseye

https://gerrit.wikimedia.org/r/908322

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2007.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2007.codfw.wmnet with OS bullseye executed with errors:

  • lvs2007 (FAIL)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2007.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2007.codfw.wmnet with OS bullseye completed:

  • lvs2007 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304122116_brett_479794_lvs2007.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 908552 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove bgp-med override for lvs2007

https://gerrit.wikimedia.org/r/908552

Change 908552 merged by Ssingh:

[operations/puppet@production] hiera: remove bgp-med override for lvs2007

https://gerrit.wikimedia.org/r/908552

Change 908585 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: lvs2008: update iface names for bullseye

https://gerrit.wikimedia.org/r/908585

Mentioned in SAL (#wikimedia-operations) [2023-04-13T15:46:40Z] <brett> Disable Puppet/PyBal on lvs2008 in preparation for reimaging - T321309

Change 908585 merged by BCornwall:

[operations/puppet@production] hiera: lvs2008: update iface names for bullseye

https://gerrit.wikimedia.org/r/908585

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2008.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2008.codfw.wmnet with OS bullseye completed:

  • lvs2008 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304131649_brett_1334141_lvs2008.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 908605 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hierdata: Remove bgp-med for lvs2008

https://gerrit.wikimedia.org/r/908605

Change 908605 merged by BCornwall:

[operations/puppet@production] hierdata: Remove bgp-med for lvs2008

https://gerrit.wikimedia.org/r/908605

Mentioned in SAL (#wikimedia-operations) [2023-04-13T17:57:30Z] <brett> Disable Puppet/PyBal on lvs2009 in preparation for reimaging - T321309

Change 908609 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: lvs2009: update iface names for bullseye

https://gerrit.wikimedia.org/r/908609

Change 908609 merged by BCornwall:

[operations/puppet@production] hiera: lvs2009: update iface names for bullseye

https://gerrit.wikimedia.org/r/908609

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2009.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2009.codfw.wmnet with OS bullseye completed:

  • lvs2009 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304131816_brett_1395867_lvs2009.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 908619 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove bgp-med override for lvs2009

https://gerrit.wikimedia.org/r/908619

Change 908619 merged by Ssingh:

[operations/puppet@production] hiera: remove bgp-med override for lvs2009

https://gerrit.wikimedia.org/r/908619

Change 908620 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hierdata: Remove bgp-med for lvs2009

https://gerrit.wikimedia.org/r/908620

Change 908620 abandoned by BCornwall:

[operations/puppet@production] hierdata: Remove bgp-med for lvs2009

Reason:

Duplicate I3187d937ffb2a1af9eea2f81b8167c9e67c62530

https://gerrit.wikimedia.org/r/908620

Change 908860 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] Remove outdated references to pybal-test200[12]

https://gerrit.wikimedia.org/r/908860

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1013.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1013.eqiad.wmnet with OS bullseye completed:

  • lvs1013 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304141536_sukhe_2281014_lvs1013.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1015.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1015.eqiad.wmnet with OS bullseye completed:

  • lvs1015 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304141647_sukhe_2330891_lvs1015.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs1014.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs1014.eqiad.wmnet with OS bullseye completed:

  • lvs1014 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304141717_brett_2353586_lvs1014.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye completed:

  • lvs1016 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304141729_brett_2363309_lvs1016.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 908909 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs/balancer: unify hiera post bullseye upgrade (codfw)

https://gerrit.wikimedia.org/r/908909

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1020.eqiad.wmnet with OS bullseye

Change 909294 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs1020: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/909294

Change 909294 merged by Ssingh:

[operations/puppet@production] hiera: lvs1020: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/909294

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1020.eqiad.wmnet with OS bullseye completed:

  • lvs1020 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run failed, asking the operator what to do
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304171407_sukhe_1028637_lvs1020.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1020.eqiad.wmnet with OS bullseye executed with errors:

  • lvs1020 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run failed, asking the operator what to do
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304171407_sukhe_1028637_lvs1020.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • The reimage failed, see the cookbook logs for the details

Change 909325 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs1019: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/909325

Change 908909 merged by Ssingh:

[operations/puppet@production] hiera: lvs/balancer: unify hiera post bullseye upgrade (codfw)

https://gerrit.wikimedia.org/r/908909

Change 909985 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] depool eqiad (emergency patch, do not merge)

https://gerrit.wikimedia.org/r/909985

Mentioned in SAL (#wikimedia-operations) [2023-04-19T13:41:23Z] <sukhe@deploy2002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309

Mentioned in SAL (#wikimedia-operations) [2023-04-19T13:41:39Z] <sukhe@deploy2002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 00m 16s)

Mentioned in SAL (#wikimedia-operations) [2023-04-19T13:41:46Z] <sukhe@deploy2002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1019.eqiad.wmnet with OS bullseye

Change 909325 merged by Ssingh:

[operations/puppet@production] hiera: lvs1019: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/909325

Change 910004 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove lvs1019's bgp-med override

https://gerrit.wikimedia.org/r/910004

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1019.eqiad.wmnet with OS bullseye completed:

  • lvs1019 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304191404_sukhe_2973122_lvs1019.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 910004 merged by Ssingh:

[operations/puppet@production] hiera: remove lvs1019's bgp-med override

https://gerrit.wikimedia.org/r/910004

Change 910027 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs1018: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/910027

Mentioned in SAL (#wikimedia-operations) [2023-04-19T15:20:36Z] <sukhe> stop pybal on lvs1018 for reimaging: T321309

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1018.eqiad.wmnet with OS bullseye

Change 910027 merged by Ssingh:

[operations/puppet@production] hiera: lvs1018: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/910027

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1018.eqiad.wmnet with OS bullseye completed:

  • lvs1018 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304191548_sukhe_3045130_lvs1018.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 910047 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove lvs1018's bgp-med override

https://gerrit.wikimedia.org/r/910047

Change 910047 merged by Ssingh:

[operations/puppet@production] hiera: remove lvs1018's bgp-med override

https://gerrit.wikimedia.org/r/910047

Mentioned in SAL (#wikimedia-operations) [2023-04-19T16:39:15Z] <sukhe> restart pybal on lvs1018 to remove bgp-med change: T321309

Change 910050 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs1017: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/910050

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1017.eqiad.wmnet with OS bullseye

Change 910050 merged by Ssingh:

[operations/puppet@production] hiera: lvs1017: update iface names for bullseye (eqiad)

https://gerrit.wikimedia.org/r/910050

Change 910058 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove lvs1017's bgp-med override

https://gerrit.wikimedia.org/r/910058

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1017.eqiad.wmnet with OS bullseye completed:

  • lvs1017 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304191746_sukhe_3128361_lvs1017.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 910058 merged by Ssingh:

[operations/puppet@production] hiera: remove lvs1017's bgp-med override

https://gerrit.wikimedia.org/r/910058

Mentioned in SAL (#wikimedia-operations) [2023-04-19T18:25:40Z] <sukhe> restart pybal on lvs1017 to pick up bgp-med change: T321309

Mentioned in SAL (#wikimedia-operations) [2023-04-19T18:28:26Z] <sukhe@deploy2002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 286m 39s)

Change 909985 abandoned by Ssingh:

[operations/dns@master] depool eqiad (emergency patch, do not merge)

Reason:

no longer required

https://gerrit.wikimedia.org/r/909985

This is now complete and we have upgraded all 176 Traffic hosts to bullseye. We would like to thank @MoritzMuehlenhoff for helping with the Pybal backport that made the LVS reimaging possible, @SLyngshede-WMF and @Volans for the Ganeti reimaging cookbook, and @cmooney for all his help in configuring Netbox. Thank you all!

@jbond for the firmware reimaging cookbook that saved us a lot of time by automating the iDRAC and NIC firmwares and having the defer reboot option.

Great work Traffic team (and you're the first SRE sub team to have completed their migration off Buster)!

Change 910563 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs/balancer: unify hiera post bullseye upgrade (eqiad)

https://gerrit.wikimedia.org/r/910563

Change 910563 merged by Ssingh:

[operations/puppet@production] hiera: lvs/balancer: unify hiera post bullseye upgrade (eqiad)

https://gerrit.wikimedia.org/r/910563

Change 910566 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] pybal/lvs: remove backward compatibility for buster

https://gerrit.wikimedia.org/r/910566

Change 910566 merged by Ssingh:

[operations/puppet@production] pybal/lvs: remove backward compatibility for buster

https://gerrit.wikimedia.org/r/910566

Change 930761 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] acme-chief: Fix PASSIVE_FQDN syntax

https://gerrit.wikimedia.org/r/930761

Change 930761 merged by Vgutierrez:

[operations/puppet@production] acme-chief: Fix PASSIVE_FQDN syntax

https://gerrit.wikimedia.org/r/930761

Change 941367 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/debs/varnish4@debian-wmf] Version 6.0.11-1wm2 for Debian Bookworm

https://gerrit.wikimedia.org/r/941367