Page MenuHomePhabricator

Migrate sanitarium hosts to Debian Trixie
Closed, ResolvedPublic

Description

  • db1154
  • db1155

Event Timeline

Marostegui triaged this task as Medium priority.Apr 20 2026, 5:18 AM
Marostegui moved this task from Triage to Ready on the DBA board.

Change #1277296 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1155.yaml: Disable notifications

https://gerrit.wikimedia.org/r/1277296

Change #1277296 merged by Marostegui:

[operations/puppet@production] db1155.yaml: Disable notifications

https://gerrit.wikimedia.org/r/1277296

Mentioned in SAL (#wikimedia-operations) [2026-04-27T07:38:49Z] <marostegui> Reimage db1155 (sanitarium host) lag to be expected on wikireplicas: s2, s4, s6, s7 T423834

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1155.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1155.eqiad.wmnet with OS trixie completed:

  • db1155 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604270801_marostegui_1119258_db1155.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1277433 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1154: Disable notifications

https://gerrit.wikimedia.org/r/1277433

Mentioned in SAL (#wikimedia-operations) [2026-04-27T09:11:09Z] <marostegui> Reimage db1154 (sanitarium host) lag to be expected on wikireplicas: s, s3, s5, s8 x3 T423834

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1154.eqiad.wmnet with OS trixie

Change #1277433 merged by Marostegui:

[operations/puppet@production] db1154: Disable notifications

https://gerrit.wikimedia.org/r/1277433

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1154.eqiad.wmnet with OS trixie completed:

  • db1154 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604270935_marostegui_1173426_db1154.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Marostegui updated the task description. (Show Details)

Done