Page MenuHomePhabricator

Upgrade Traffic hosts to bookworm
Open, Stalled, MediumPublic

Description

This task tracks the upgrade of the Traffic hosts to bookworm, affecting the services below and identified by their cumin aliases.

This is meant to be an umbrella task for all changes that will be part of this upgrade, such as the Debian packaging, Puppet changes, and the related testing include reimaging.

Progress:

HostDebian PackagesReimaging
cpdoneblocked by T352744
acmechiefdonedone
ncredirdonedone
dnsdonedone
durumdonedone
wikidoughdonedone
lvsN/A (no Python 2)N/A
lvs experimental
pybal-testN/A (no Python 2)N/A

Debian Packaging

This includes packages that we build ourselves.

A:cp

A:acmechief

  • acme-chief

A:dns (rec/auth)

  • gdnsd (gdnsd_3.99.0~alpha2-2_amd64.changes)
  • pdns-recursor (already in Debian proper)

A:durum

A:wikidough

  • anycast-healthchecker (anycast-healthchecker_0.9.1-1+wmf12u1_amd64.changes)
  • dnsdist (dnsdist_1.8.0-1+wmf12u1_amd64.changes)

A:lvs

N/A

lvs experimental

A:pybaltest

N/A

Bookworm reimaging

A:acmechief

  • acmechief2001.codfw.wmnet
  • acmechief2002.codfw.wmnet
  • acmechief1001.eqiad.wmnet
  • acmechief-test1001.eqiad.wmnet
  • acmechief-test2001.codfw.wmnet

A:cp

  • cp1075.eqiad.wmnet
  • cp1076.eqiad.wmnet
  • cp1077.eqiad.wmnet
  • cp1078.eqiad.wmnet
  • cp1079.eqiad.wmnet
  • cp1080.eqiad.wmnet
  • cp1081.eqiad.wmnet
  • cp1082.eqiad.wmnet
  • cp1083.eqiad.wmnet
  • cp1084.eqiad.wmnet
  • cp1085.eqiad.wmnet
  • cp1086.eqiad.wmnet
  • cp1087.eqiad.wmnet
  • cp1088.eqiad.wmnet
  • cp1089.eqiad.wmnet
  • cp1090.eqiad.wmnet
  • cp2027.codfw.wmnet
  • cp2028.codfw.wmnet
  • cp2029.codfw.wmnet
  • cp2030.codfw.wmnet
  • cp2031.codfw.wmnet
  • cp2032.codfw.wmnet
  • cp2033.codfw.wmnet
  • cp2034.codfw.wmnet
  • cp2035.codfw.wmnet
  • cp2036.codfw.wmnet
  • cp2037.codfw.wmnet
  • cp2038.codfw.wmnet
  • cp2039.codfw.wmnet
  • cp2040.codfw.wmnet
  • cp2041.codfw.wmnet
  • cp2042.codfw.wmnet
  • cp3066.esams.wmnet
  • cp3067.esams.wmnet
  • cp3068.esams.wmnet
  • cp3069.esams.wmnet
  • cp3070.esams.wmnet
  • cp3071.esams.wmnet
  • cp3072.esams.wmnet
  • cp3073.esams.wmnet
  • cp3074.esams.wmnet
  • cp3075.esams.wmnet
  • cp3076.esams.wmnet
  • cp3077.esams.wmnet
  • cp3078.esams.wmnet
  • cp3079.esams.wmnet
  • cp3080.esams.wmnet
  • cp3081.esams.wmnet
  • cp4037.ulsfo.wmnet
  • cp4038.ulsfo.wmnet
  • cp4039.ulsfo.wmnet
  • cp4040.ulsfo.wmnet
  • cp4041.ulsfo.wmnet
  • cp4042.ulsfo.wmnet
  • cp4043.ulsfo.wmnet
  • cp4044.ulsfo.wmnet
  • cp4045.ulsfo.wmnet
  • cp4046.ulsfo.wmnet
  • cp4047.ulsfo.wmnet
  • cp4048.ulsfo.wmnet
  • cp4049.ulsfo.wmnet
  • cp4050.ulsfo.wmnet
  • cp4051.ulsfo.wmnet
  • cp4052.ulsfo.wmnet
  • cp5017.eqsin.wmnet
  • cp5018.eqsin.wmnet
  • cp5019.eqsin.wmnet
  • cp5020.eqsin.wmnet
  • cp5021.eqsin.wmnet
  • cp5022.eqsin.wmnet
  • cp5023.eqsin.wmnet
  • cp5024.eqsin.wmnet
  • cp5025.eqsin.wmnet
  • cp5026.eqsin.wmnet
  • cp5027.eqsin.wmnet
  • cp5028.eqsin.wmnet
  • cp5029.eqsin.wmnet
  • cp5030.eqsin.wmnet
  • cp5031.eqsin.wmnet
  • cp5032.eqsin.wmnet
  • cp6001.drmrs.wmnet
  • cp6002.drmrs.wmnet
  • cp6003.drmrs.wmnet
  • cp6004.drmrs.wmnet
  • cp6005.drmrs.wmnet
  • cp6006.drmrs.wmnet
  • cp6007.drmrs.wmnet
  • cp6008.drmrs.wmnet
  • cp6009.drmrs.wmnet
  • cp6010.drmrs.wmnet
  • cp6011.drmrs.wmnet
  • cp6012.drmrs.wmnet
  • cp6013.drmrs.wmnet
  • cp6014.drmrs.wmnet
  • cp6015.drmrs.wmnet
  • cp6016.drmrs.wmnet

A:durum

  • durum1001.eqiad.wmnet
  • durum1002.eqiad.wmnet
  • durum2001.codfw.wmnet
  • durum2002.codfw.wmnet
  • durum4001.ulsfo.wmnet
  • durum4002.ulsfo.wmnet
  • durum5001.eqsin.wmnet
  • durum5002.eqsin.wmnet
  • durum6001.drmrs.wmnet
  • durum6002.drmrs.wmnet

A:ncredir

  • ncredir1001.eqiad.wmnet
  • ncredir1002.eqiad.wmnet
  • ncredir2001.codfw.wmnet
  • ncredir2002.codfw.wmnet
  • ncredir3003.esams.wmnet
  • ncredir3004.esams.wmnet
  • ncredir4001.ulsfo.wmnet
  • ncredir4002.ulsfo.wmnet
  • ncredir5001.eqsin.wmnet
  • ncredir5002.eqsin.wmnet
  • ncredir6001.drmrs.wmnet
  • ncredir6002.drmrs.wmnet

A:dns-rec (and auth)

  • dns1004.wikimedia.org
  • dns1005.wikimedia.org
  • dns1006.wikimedia.org
  • dns2004.wikimedia.org
  • dns2005.wikimedia.org
  • dns2006.wikimedia.org
  • dns3003.wikimedia.org
  • dns3004.wikimedia.org
  • dns4003.wikimedia.org
  • dns4004.wikimedia.org
  • dns5003.wikimedia.org
  • dns5004.wikimedia.org
  • dns6001.wikimedia.org
  • dns6002.wikimedia.org

A:wikidough

  • doh1001.wikimedia.org
  • doh1002.wikimedia.org
  • doh2001.wikimedia.org
  • doh2002.wikimedia.org
  • doh3003.wikimedia.org
  • doh3004.wikimedia.org
  • doh4001.wikimedia.org
  • doh4002.wikimedia.org
  • doh5001.wikimedia.org
  • doh5002.wikimedia.org
  • doh6001.wikimedia.org
  • doh6002.wikimedia.org

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+4 -4
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/software/acme-chiefmaster+21 -23
operations/software/acme-chiefdebian+31 -22
operations/debs/trafficservermaster+9 -2
operations/puppetproduction+20 -7
operations/puppetproduction+19 -4
operations/puppetproduction+2 -1
operations/debs/trafficservermaster+8 -1
operations/software/varnish/libvmod-re2debian-6.0+33 -0
operations/software/varnish/libvmod-netmapperdebian+10 -2
operations/debs/varnish-modulesmaster+8 -1
operations/debs/gdnsdmaster+7 -50
operations/debs/python-anycast-healthcheckermaster+34 -11
operations/software/varnish/libvmod-querysortmain+9 -1
operations/software/varnish/varnishkafkadebian+11 -113
operations/debs/varnish4debian-wmf+10 -2
operations/debs/python-logstashmaster+11 -0
operations/software/purgedmaster+14 -19
operations/debs/prometheus-varnishkafka-exportermaster+11 -1
operations/puppetproduction+3 -0
operations/software/prometheus-rdkafka-exportermaster+8 -0
operations/puppetproduction+1 -0
operations/debs/file-read-backwardsdebian+12 -1
operations/software/fifo-log-demuxmaster+15 -3
operations/debs/dnsdistmaster+6 -0
Show related patches Customize query in gerrit
TitleReferenceAuthorSource BranchDest Branch
Update dependencies to match Bookworm versionsrepos/sre/acme-chief!3brettchange-949544-update-deps-bookworm-versionsmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 966907 merged by BCornwall:

[operations/puppet@production] hiera: remove dns6001 from authdns_servers

https://gerrit.wikimedia.org/r/966907

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns6001.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns6001.wikimedia.org with OS bookworm completed:

  • dns6001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310181845_brett_2285734_dns6001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 967968 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns6002 from authdns_servers

https://gerrit.wikimedia.org/r/967968

Change 967968 merged by BCornwall:

[operations/puppet@production] hiera: remove dns6002 from authdns_servers

https://gerrit.wikimedia.org/r/967968

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns6002.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns6002.wikimedia.org with OS bookworm completed:

  • dns6002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310231812_brett_3585315_dns6002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 967976 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns4004 from authdns_servers

https://gerrit.wikimedia.org/r/967976

Change 967976 merged by BCornwall:

[operations/puppet@production] hiera: remove dns4004 from authdns_servers

https://gerrit.wikimedia.org/r/967976

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns4004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns4004.wikimedia.org with OS bookworm completed:

  • dns4004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310232044_brett_3618436_dns4004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 968294 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns5003 from authdns_servers

https://gerrit.wikimedia.org/r/968294

Change 968294 merged by BCornwall:

[operations/puppet@production] hiera: remove dns5003 from authdns_servers

https://gerrit.wikimedia.org/r/968294

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns5003.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns5003.wikimedia.org with OS bookworm completed:

  • dns5003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310241749_brett_3845170_dns5003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 968342 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns5004 from authdns_servers

https://gerrit.wikimedia.org/r/968342

Change 968342 merged by BCornwall:

[operations/puppet@production] hiera: remove dns5004 from authdns_servers

https://gerrit.wikimedia.org/r/968342

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns5004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns5004.wikimedia.org with OS bookworm completed:

  • dns5004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310242014_brett_3876277_dns5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 968680 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove dns1004 for authdns_servers for reimaging

https://gerrit.wikimedia.org/r/968680

Change 968680 merged by Ssingh:

[operations/puppet@production] hiera: remove dns1004 for authdns_servers for reimaging

https://gerrit.wikimedia.org/r/968680

Change 968721 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns1005 from authdns_servers

https://gerrit.wikimedia.org/r/968721

Change 968721 merged by BCornwall:

[operations/puppet@production] hiera: remove dns1005 from authdns_servers

https://gerrit.wikimedia.org/r/968721

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns1005.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns1005.wikimedia.org with OS bookworm completed:

  • dns1005 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310251702_brett_4115897_dns1005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 968735 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns1006 from authdns_servers

https://gerrit.wikimedia.org/r/968735

Change 968735 merged by BCornwall:

[operations/puppet@production] hiera: remove dns1006 from authdns_servers

https://gerrit.wikimedia.org/r/968735

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns1006.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns1006.wikimedia.org with OS bookworm completed:

  • dns1006 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310251944_brett_4150404_dns1006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 969185 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns2001 from authdns_servers

https://gerrit.wikimedia.org/r/969185

Change 969185 merged by BCornwall:

[operations/puppet@production] hiera: remove dns2004 from authdns_servers

https://gerrit.wikimedia.org/r/969185

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns2004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns2004.wikimedia.org with OS bookworm completed:

  • dns2004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310261908_brett_216869_dns2004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 969194 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns2005 from authdns_servers

https://gerrit.wikimedia.org/r/969194

Change 969194 merged by BCornwall:

[operations/puppet@production] hiera: remove dns2005 from authdns_servers

https://gerrit.wikimedia.org/r/969194

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns2005.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns2005.wikimedia.org with OS bookworm completed:

  • dns2005 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310262023_brett_237506_dns2005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 969212 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns2006 from authdns_servers

https://gerrit.wikimedia.org/r/969212

Change 969212 merged by BCornwall:

[operations/puppet@production] hiera: remove dns2006 from authdns_servers

https://gerrit.wikimedia.org/r/969212

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns2006.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns2006.wikimedia.org with OS bookworm completed:

  • dns2006 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310262210_brett_260534_dns2006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 969931 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns3003 from authdns_servers

https://gerrit.wikimedia.org/r/969931

Change 969931 merged by BCornwall:

[operations/puppet@production] hiera: remove dns3003 from authdns_servers

https://gerrit.wikimedia.org/r/969931

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns3003.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns3003.wikimedia.org with OS bookworm completed:

  • dns3003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310301742_brett_1204482_dns3003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 969949 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hiera: remove dns3004 from authdns_servers

https://gerrit.wikimedia.org/r/969949

Change 969949 merged by BCornwall:

[operations/puppet@production] hiera: remove dns3004 from authdns_servers

https://gerrit.wikimedia.org/r/969949

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns3004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns3004.wikimedia.org with OS bookworm completed:

  • dns3004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202310301951_brett_1232033_dns3004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host acmechief-test1001.eqiad.wmnet with OS bookworm

Change 972886 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme_chief: Set acmechief-test1001 as active host

https://gerrit.wikimedia.org/r/972886

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host acmechief-test1001.eqiad.wmnet with OS bookworm completed:

  • acmechief-test1001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311082005_brett_1618392_acmechief-test1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 972886 merged by BCornwall:

[operations/puppet@production] acme_chief: Set acmechief-test1001 as active host

https://gerrit.wikimedia.org/r/972886

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host acmechief-test2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host acmechief-test2001.codfw.wmnet with OS bookworm completed:

  • acmechief-test2001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311091851_brett_2257575_acmechief-test2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
ssingh updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-11-13T18:42:55Z] <sukhe> pool cp4052 as first cp host for bookworm testing: T342154

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin1001 for host acmechief1001.eqiad.wmnet with OS bookworm

Change 975024 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme_chief: Remove acmechief1001 passive host

https://gerrit.wikimedia.org/r/975024

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin1001 for host acmechief1001.eqiad.wmnet with OS bookworm completed:

  • acmechief1001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311161631_brett_1915187_acmechief1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2023-11-16T17:00:22Z] <brett> Disabling puppet on all acme-chief clients for acme-chief bookworm upgrades - T342154

Change 975046 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme-chief: Set acmechief1001 as active

https://gerrit.wikimedia.org/r/975046

Change 975046 merged by BCornwall:

[operations/puppet@production] acme-chief: Set acmechief1001 as active

https://gerrit.wikimedia.org/r/975046

Change 975047 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme-chief: Switch acmechief_host to acmechief1001

https://gerrit.wikimedia.org/r/975047

Change 975047 merged by BCornwall:

[operations/puppet@production] acme-chief: Switch acmechief_host to acmechief1001

https://gerrit.wikimedia.org/r/975047

Mentioned in SAL (#wikimedia-operations) [2023-11-16T17:26:59Z] <brett> Re-enabling puppet on all acme-chief clients post-bookworm upgrade - T342154

Change 975024 abandoned by BCornwall:

[operations/puppet@production] acme_chief: Remove acmechief1001 passive host

Reason:

No need any more

https://gerrit.wikimedia.org/r/975024

Change 975853 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme-chief: Remove acmechief2001 passive host

https://gerrit.wikimedia.org/r/975853

Change 975853 merged by BCornwall:

[operations/puppet@production] acme-chief: Remove acmechief2001 passive host

https://gerrit.wikimedia.org/r/975853

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin1001 for host acmechief2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin1001 for host acmechief2001.codfw.wmnet with OS bookworm completed:

  • acmechief2001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311201844_brett_255399_acmechief2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 975911 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] acme-chief: Remove acmechief2002 passive host

https://gerrit.wikimedia.org/r/975911

BCornwall updated the task description. (Show Details)

Change 975911 abandoned by BCornwall:

[operations/puppet@production] acme-chief: Remove acmechief2002 passive host

Reason:

Already upgraded

https://gerrit.wikimedia.org/r/975911

BCornwall changed the task status from In Progress to Stalled.Tue, Apr 9, 8:47 PM

Stalling as cp hosts cannot be updated until the performance issues with openssl 3.x are dealt with (T352744)