Page MenuHomePhabricator

Upgrade Traffic hosts to trixie
Open, In Progress, LowPublic

Description

This task tracks the upgrade of the Traffic hosts to trixie, affecting the services below and identified by their cumin aliases.

This is meant to be an umbrella task for all changes that will be part of this upgrade, such as the Debian packaging, Puppet changes, and the related testing include reimaging.

Progress:

HostDebian PackagesReimaging
acmechief
cpdone
dns
durumdonedone
lvs
ncmonitordonedone
ncredir
wikidough
haproxykafka

Debian Packaging

This includes packages that we build ourselves.

A:acmechief

A:cp

A:dnsbox

  • gdnsd

A:durum

  • anycast-healthchecker

A:lvs

  • liberica

A:ncmonitor

A:ncredir

  • benthos/redpanda

A:wikidough

  • anycast-healthchecker
  • dnsdist

trixie reimaging

A:acmechief

  • acmechief2002.codfw.wmnet
  • acmechief1002.eqiad.wmnet
  • acmechief-test2001.codfw.wmnet
  • acmechief-test1001.eqiad.wmnet

A:cp

  • cp1100.eqiad.wmnet
  • cp1101.eqiad.wmnet
  • cp1102.eqiad.wmnet
  • cp1103.eqiad.wmnet
  • cp1104.eqiad.wmnet
  • cp1105.eqiad.wmnet
  • cp1106.eqiad.wmnet
  • cp1107.eqiad.wmnet
  • cp1108.eqiad.wmnet
  • cp1109.eqiad.wmnet
  • cp1110.eqiad.wmnet
  • cp1111.eqiad.wmnet
  • cp1112.eqiad.wmnet
  • cp1113.eqiad.wmnet
  • cp1114.eqiad.wmnet
  • cp1115.eqiad.wmnet
  • cp2027.codfw.wmnet
  • cp2028.codfw.wmnet
  • cp2029.codfw.wmnet
  • cp2030.codfw.wmnet
  • cp2031.codfw.wmnet
  • cp2032.codfw.wmnet
  • cp2033.codfw.wmnet
  • cp2034.codfw.wmnet
  • cp2035.codfw.wmnet
  • cp2036.codfw.wmnet
  • cp2037.codfw.wmnet
  • cp2038.codfw.wmnet
  • cp2039.codfw.wmnet
  • cp2040.codfw.wmnet
  • cp2041.codfw.wmnet
  • cp2042.codfw.wmnet
  • cp3066.esams.wmnet
  • cp3067.esams.wmnet
  • cp3068.esams.wmnet
  • cp3069.esams.wmnet
  • cp3070.esams.wmnet
  • cp3071.esams.wmnet
  • cp3072.esams.wmnet
  • cp3073.esams.wmnet
  • cp3074.esams.wmnet
  • cp3075.esams.wmnet
  • cp3076.esams.wmnet
  • cp3077.esams.wmnet
  • cp3078.esams.wmnet
  • cp3079.esams.wmnet
  • cp3080.esams.wmnet
  • cp3081.esams.wmnet
  • cp4037.ulsfo.wmnet
  • cp4038.ulsfo.wmnet
  • cp4039.ulsfo.wmnet
  • cp4040.ulsfo.wmnet
  • cp4041.ulsfo.wmnet
  • cp4042.ulsfo.wmnet
  • cp4043.ulsfo.wmnet
  • cp4044.ulsfo.wmnet
  • cp4045.ulsfo.wmnet
  • cp4046.ulsfo.wmnet
  • cp4047.ulsfo.wmnet
  • cp4048.ulsfo.wmnet
  • cp4049.ulsfo.wmnet
  • cp4050.ulsfo.wmnet
  • cp4051.ulsfo.wmnet
  • cp4052.ulsfo.wmnet
  • cp5017.eqsin.wmnet
  • cp5018.eqsin.wmnet
  • cp5019.eqsin.wmnet
  • cp5020.eqsin.wmnet
  • cp5021.eqsin.wmnet
  • cp5022.eqsin.wmnet
  • cp5023.eqsin.wmnet
  • cp5024.eqsin.wmnet
  • cp5025.eqsin.wmnet
  • cp5026.eqsin.wmnet
  • cp5027.eqsin.wmnet
  • cp5028.eqsin.wmnet
  • cp5029.eqsin.wmnet
  • cp5030.eqsin.wmnet
  • cp5031.eqsin.wmnet
  • cp5032.eqsin.wmnet
  • cp6001.drmrs.wmnet
  • cp6002.drmrs.wmnet
  • cp6003.drmrs.wmnet
  • cp6004.drmrs.wmnet
  • cp6005.drmrs.wmnet
  • cp6006.drmrs.wmnet
  • cp6007.drmrs.wmnet
  • cp6008.drmrs.wmnet
  • cp6009.drmrs.wmnet
  • cp6010.drmrs.wmnet
  • cp6011.drmrs.wmnet
  • cp6012.drmrs.wmnet
  • cp6013.drmrs.wmnet
  • cp6014.drmrs.wmnet
  • cp6015.drmrs.wmnet
  • cp6016.drmrs.wmnet
  • cp7001.magru.wmnet
  • cp7002.magru.wmnet
  • cp7003.magru.wmnet
  • cp7004.magru.wmnet
  • cp7005.magru.wmnet
  • cp7006.magru.wmnet
  • cp7007.magru.wmnet
  • cp7008.magru.wmnet
  • cp7009.magru.wmnet
  • cp7010.magru.wmnet
  • cp7011.magru.wmnet
  • cp7012.magru.wmnet
  • cp7013.magru.wmnet
  • cp7014.magru.wmnet
  • cp7015.magru.wmnet
  • cp7016.magru.wmnet

A:durum

  • durum1001.eqiad.wmnet
  • durum1002.eqiad.wmnet
  • durum2001.codfw.wmnet
  • durum2002.codfw.wmnet
  • durum3005.esams.wmnet
  • durum3006.esams.wmnet
  • durum4001.ulsfo.wmnet
  • durum4002.ulsfo.wmnet
  • durum5001.eqsin.wmnet
  • durum5002.eqsin.wmnet
  • durum6001.drmrs.wmnet
  • durum6002.drmrs.wmnet
  • durum7003.magru.wmnet
  • durum7004.magru.wmnet

A:ncmonitor

  • ncmonitor1001.eqiad.wmnet

A:ncredir

  • ncredir1001.eqiad.wmnet
  • ncredir1002.eqiad.wmnet
  • ncredir2001.codfw.wmnet
  • ncredir2002.codfw.wmnet
  • ncredir3005.esams.wmnet
  • ncredir3006.esams.wmnet
  • ncredir4001.ulsfo.wmnet
  • ncredir4002.ulsfo.wmnet
  • ncredir5001.eqsin.wmnet
  • ncredir5002.eqsin.wmnet
  • ncredir6001.drmrs.wmnet
  • ncredir6002.drmrs.wmnet
  • ncredir7003.magru.wmnet
  • ncredir7004.magru.wmnet

A:dnsbox

  • dns1004.wikimedia.org
  • dns1005.wikimedia.org
  • dns1006.wikimedia.org
  • dns2004.wikimedia.org
  • dns2005.wikimedia.org
  • dns2006.wikimedia.org
  • dns3003.wikimedia.org
  • dns3004.wikimedia.org
  • dns4003.wikimedia.org
  • dns4004.wikimedia.org
  • dns5003.wikimedia.org
  • dns5004.wikimedia.org
  • dns6001.wikimedia.org
  • dns6002.wikimedia.org
  • dns7001.wikimedia.org
  • dns7002.wikimedia.org

A:wikidough

  • doh1001.wikimedia.org
  • doh1002.wikimedia.org
  • doh2001.wikimedia.org
  • doh2002.wikimedia.org
  • doh3005.wikimedia.org
  • doh3006.wikimedia.org
  • doh4001.wikimedia.org
  • doh4002.wikimedia.org
  • doh5001.wikimedia.org
  • doh5002.wikimedia.org
  • doh6001.wikimedia.org
  • doh6002.wikimedia.org
  • doh7003.wikimedia.org
  • doh7004.wikimedia.org

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1003 for host cp2043.codfw.wmnet with OS trixie completed:

  • cp2043 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602111352_slyngshede_2859730_cp2043.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Some clarification, too: T386796#10642779 suggests that 3.1 was problematic. The hope is that 3.0 will not contain those issues and that the upgrade to 3.2 (or 3.3) will skip those issues.

@Vgutierrez / @Fabfur Regarding upgrading from 3.0 in Bullseye to 3.0 in Trixie.... 3.0 in Bullseye (via haproxy.debian.net) would pull in the latest 3.0.15... but then upgrading to Trixie and utilizing the repo's package would mean that we switch to 3.0.11-1+deb13u1. When upgrading to 3.0 in Bullseye, I propose we instead use the version 3.0.11-1~bpo11+1. It appears that the codebase would be the same for both distributions.

Change #1238403 merged by Vgutierrez:

[operations/puppet@production] aptrepo,haproxy: Allow HAProxy 3.0 on bullseye

https://gerrit.wikimedia.org/r/1238403

Mentioned in SAL (#wikimedia-operations) [2026-02-12T15:23:29Z] <vgutierrez> fetch haproxy 3.0.15 on thirdparty/haproxy30 (bullseye-wikimedia) - T401832

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum3005.esams.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum7004.magru.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum3005.esams.wmnet with OS trixie completed:

  • durum3005 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602121614_sukhe_3520396_durum3005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum7004.magru.wmnet with OS trixie completed:

  • durum7004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602121629_sukhe_3532274_durum7004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum7003.magru.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum3006.esams.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum3006.esams.wmnet with OS trixie completed:

  • durum3006 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602121723_sukhe_3592573_durum3006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum7003.magru.wmnet with OS trixie completed:

  • durum7003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602121727_sukhe_3591897_durum7003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

All durum hosts upgraded to trixie, timestamping.

@SLyngshede-WMF If you could get a trixie host up and running in codfw with some of the new hardware that'd give us a great launchpad for testing out a trixie host.

In the meantime, @CDobbins and I are going to try and get a horizon trixie host up and running.

@BCornwall We currently have two hosts:

  • cp2043
  • cp2044

Both are now on Trixie

Change #1239383 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] site.pp: Provision cp2043/cp2044 as cache nodes

https://gerrit.wikimedia.org/r/1239383

Change #1239383 merged by BCornwall:

[operations/puppet@production] site.pp: Provision cp2043/cp2044 as cache nodes

https://gerrit.wikimedia.org/r/1239383

Change #1239401 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] varnish:common: add trixie version for Python

https://gerrit.wikimedia.org/r/1239401

Change #1239401 merged by BCornwall:

[operations/puppet@production] varnish:common: add trixie version for Python

https://gerrit.wikimedia.org/r/1239401

Change #1239434 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] varnish:common: Re-add buster version for Python

https://gerrit.wikimedia.org/r/1239434

Change #1239434 merged by BCornwall:

[operations/puppet@production] varnish:common: Re-add buster version for Python

https://gerrit.wikimedia.org/r/1239434

Mentioned in SAL (#wikimedia-operations) [2026-02-13T23:01:44Z] <brett> Import varnishkafka 1.2.0~deb13+wmf1 into trixie-wikimedia (T401832)

Mentioned in SAL (#wikimedia-operations) [2026-02-13T23:25:36Z] <brett> Import prometheus-varnishkafka-exporter 0.1~deb13u1 into trixie-wikimedia (T401832)

Change #1239460 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/debs/python-logstash@master] Version 0.4.6~deb13u1

https://gerrit.wikimedia.org/r/1239460

@Fabfur I uploaded a new trixie-wikimedia branch on lua-maxminddb's repo - sadly, I can't seem to create an MR into a non-existent branch so I just created the branch. Could you check it and approve the commit/packaging? :)

Mentioned in SAL (#wikimedia-operations) [2026-02-14T03:37:30Z] <brett> Import lua5.4-maxminddb 0.1.1~deb13u1 into trixie-wikimedia (T401832)

Change #1239463 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] cache::haproxy: Only use lua5.3 mmdb on

https://gerrit.wikimedia.org/r/1239463

Change #1239602 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::wmfuniq: Use same filesystem for tempfile to avoid cross-filesystem errors

https://gerrit.wikimedia.org/r/1239602

Change #1239602 merged by Vgutierrez:

[operations/puppet@production] cache::wmfuniq: Fix cross-filesystem tempfile error on Trixie

https://gerrit.wikimedia.org/r/1239602

Change #1239631 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::dp_key: Fix cross-filesystems tempfile error on Trixie

https://gerrit.wikimedia.org/r/1239631

Change #1239631 merged by Vgutierrez:

[operations/puppet@production] cache::dp_key: Fix cross-filesystems tempfile error on Trixie

https://gerrit.wikimedia.org/r/1239631

Change #1239463 abandoned by BCornwall:

[operations/puppet@production] cache::haproxy: Only use lua5.3 mmdb on haproxy28

Reason:

Idee228cf20c011040052d9e4e6e1349de94893f9

https://gerrit.wikimedia.org/r/1239463

Mentioned in SAL (#wikimedia-operations) [2026-02-17T14:07:03Z] <vgutierrez> upload golang-github-florianl-go-tc_0.4.7 to trixie-wikimedia (apt.wm.o) - T401832

Mentioned in SAL (#wikimedia-operations) [2026-02-17T14:58:25Z] <vgutierrez> upload golang-github-mmatczuk-anyflag-dev 0.0~git20240709.eb9e24c-1 to trixie-wikimedia (apt.wm.o) - T401832

Change #1239963 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Also install the pbuilder hooks for trixie

https://gerrit.wikimedia.org/r/1239963

Change #1239963 merged by Muehlenhoff:

[operations/puppet@production] Also install the pbuilder hooks for trixie

https://gerrit.wikimedia.org/r/1239963

Change #1239460 abandoned by BCornwall:

[operations/debs/python-logstash@master] Version 0.4.6~deb13u1

Reason:

Moved to gitlab: https://gitlab.wikimedia.org/repos/sre/python-logstash

https://gerrit.wikimedia.org/r/1239460

Mentioned in SAL (#wikimedia-operations) [2026-02-17T17:57:00Z] <brett> Import python-logstash (python3-logstash) 0.4.6~deb13u1 to trixie-wikimedia (T401832)

Change #1240064 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] hieradata: Set HAProxy version to 3 for cp204[34]

https://gerrit.wikimedia.org/r/1240064

Change #1240064 merged by BCornwall:

[operations/puppet@production] hieradata: Set HAProxy version to 3 for cp204[34]

https://gerrit.wikimedia.org/r/1240064

Mentioned in SAL (#wikimedia-operations) [2026-02-18T11:57:48Z] <vgutierrez> upload golang-github-intel-go-cpuid 0.0~git20210602.5747e5c-2+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832

Mentioned in SAL (#wikimedia-operations) [2026-02-18T14:39:19Z] <vgutierrez> upload golang-github-u-root-u-root 0.12.0-1 to trixie-wikimedia (apt.wm.o) - T401832

Mentioned in SAL (#wikimedia-operations) [2026-02-18T14:48:10Z] <vgutierrez> upload golang-gitlab-wikimedia-sre-qemutest-dev 0.1.0+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832

Mentioned in SAL (#wikimedia-operations) [2026-02-18T14:54:12Z] <vgutierrez> uplodaded tcp-mss-clamper 0.6+deb13u1 to trixie-wikimedia (apt-wm.o) - T401832

Mentioned in SAL (#wikimedia-operations) [2026-02-18T19:05:51Z] <brett> import haproxykafka 0.3.16+deb13u1 into trixie-wikimedia (T401832)

Change #1243184 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] ats: Set secondary nvme drives for new codfw hosts

https://gerrit.wikimedia.org/r/1243184

Change #1243195 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] varnishkafka: Only enable prom exporter for text

https://gerrit.wikimedia.org/r/1243195

Mentioned in SAL (#wikimedia-operations) [2026-02-24T22:37:20Z] <brett> import ncmonitor 3.1.0~deb13u1 into trixie-wikimedia (T401832)

Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host ncmonitor1001.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host ncmonitor1001.eqiad.wmnet with OS trixie completed:

  • ncmonitor1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602242259_brett_2931472_ncmonitor1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1243184 merged by BCornwall:

[operations/puppet@production] ats: Set secondary nvme drives for new codfw hosts

https://gerrit.wikimedia.org/r/1243184

Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2043.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2043.codfw.wmnet with OS trixie completed:

  • cp2043 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Set pooled=inactive for the following services on confctl:

{"cp2043.codfw.wmnet": {"weight": 0, "pooled": "inactive"}, "tags": "dc=codfw,cluster=cache_text,service=cdn"}
{"cp2043.codfw.wmnet": {"weight": 0, "pooled": "inactive"}, "tags": "dc=codfw,cluster=cache_text,service=ats-be"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present and deleted any certificates
  • Removed from Debmonitor if present
  • Forced UEFI HTTP Boot for next reboot
  • Host rebooted via Redfish
  • Host up (Debian installer)
  • Host up (new fresh trixie OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga/Alertmanager
  • Removed previous downtime on Alertmanager (old OS)
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602252048_cdobbins_3572558_cp2043.out
  • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • No changes in confctl are needed to restore the previous state.
  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2044.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2044.codfw.wmnet with OS trixie completed:

  • cp2044 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Set pooled=inactive for the following services on confctl:

{"cp2044.codfw.wmnet": {"weight": 0, "pooled": "no"}, "tags": "dc=codfw,cluster=cache_upload,service=ats-be"}
{"cp2044.codfw.wmnet": {"weight": 0, "pooled": "no"}, "tags": "dc=codfw,cluster=cache_upload,service=cdn"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present and deleted any certificates
  • Removed from Debmonitor if present
  • Forced UEFI HTTP Boot for next reboot
  • Host rebooted via Redfish
  • Host up (Debian installer)
  • Host up (new fresh trixie OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga/Alertmanager
  • Removed previous downtime on Alertmanager (old OS)
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602252135_cdobbins_3606681_cp2044.out
  • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Some services have a zero weight, you have to set a weight with:

sudo confctl select '{tags_line}' set/weight=NN
sudo confctl select '{tags_line}' set/weight=NN

  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'name=cp2044\.codfw\.wmnet,dc=codfw,cluster=cache_upload,service=ats\-be' set/pooled=no
sudo confctl select 'name=cp2044\.codfw\.wmnet,dc=codfw,cluster=cache_upload,service=cdn' set/pooled=no

  • Updated Netbox data from PuppetDB

Change #1243195 merged by BCornwall:

[operations/puppet@production] varnishkafka: Only enable for text

https://gerrit.wikimedia.org/r/1243195

Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2047.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2047.codfw.wmnet with OS trixie completed:

  • cp2047 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602262102_cdobbins_127988_cp2047.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2057.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp2058.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2057.codfw.wmnet with OS trixie completed:

  • cp2057 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602272114_cdobbins_866779_cp2057.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp2058.codfw.wmnet with OS trixie completed:

  • cp2058 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602272117_cdobbins_870415_cp2058.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB