Page MenuHomePhabricator

Migrate 1P db* to Debian Trixie
Closed, ResolvedPublic

Description

We are currently testing 3 1 CPU config E hosts:

  • db1264 Dell - x1
  • db2248 Dell - s4
  • db2249 SM UEFI - x1

There have been no issues, let's just test Debian Trixie there - I don't really expect any changes, but just to confirm

Event Timeline

Marostegui renamed this task from Migrate 1P to Debian Trixie to Migrate 1P db* to Debian Trixie.Jan 23 2026, 12:58 PM
Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to In progress on the DBA board.

db1264 is an active m5 master.

Change #1233097 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1264: Disable notifications

https://gerrit.wikimedia.org/r/1233097

Change #1233097 merged by Marostegui:

[operations/puppet@production] db1264: Disable notifications

https://gerrit.wikimedia.org/r/1233097

Mentioned in SAL (#wikimedia-operations) [2026-01-26T08:48:53Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Depool db1264 T415358', diff saved to https://phabricator.wikimedia.org/P87925 and previous config saved to /var/cache/conftool/dbconfig/20260126-084852-marostegui.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1264.eqiad.wmnet with OS trixie

Started cloning db1224.eqiad.wmnet to db1264.eqiad.wmnet - marostegui@cumin1003

Completed depool of db1224 - Depool db1224.eqiad.wmnet to then clone it to db1264.eqiad.wmnet - marostegui@cumin1003 - marostegui@cumin1003

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1264.eqiad.wmnet with OS trixie completed:

  • db1264 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601260908_marostegui_3710388_db1264.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Start pool of db1224 gradually with 4 steps - Pool db1224.eqiad.wmnet in after cloning - marostegui@cumin1003

Completed pool of db1224 gradually with 4 steps - Pool db1224.eqiad.wmnet in after cloning - marostegui@cumin1003

db1264 has been reimaged with Debian Trixie. /srv was mistakenly formatted fixed at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1233114 - so I had to reclone it.

Start pool of db1264 gradually with 4 steps - Pool db1264.eqiad.wmnet in after cloning - marostegui@cumin1003

Completed pool of db1264 gradually with 4 steps - Pool db1264.eqiad.wmnet in after cloning - marostegui@cumin1003

Finished cloning db1224.eqiad.wmnet to db1264.eqiad.wmnet - marostegui@cumin1003

Change #1233569 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] installserver: Do not format db2248

https://gerrit.wikimedia.org/r/1233569

Change #1233569 merged by Marostegui:

[operations/puppet@production] installserver: Do not format db2248

https://gerrit.wikimedia.org/r/1233569

Completed depooling of db2248 by marostegui@cumin1003: Reimage

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2248.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2248.codfw.wmnet with OS trixie completed:

  • db2248 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601270739_marostegui_3854760_db2248.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Starting pool of db2248 by marostegui@cumin1003: After reimage

Completed pooling of db2248 by marostegui@cumin1003: After reimage

Mentioned in SAL (#wikimedia-operations) [2026-02-02T09:03:29Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Depool db2249 T415358', diff saved to https://phabricator.wikimedia.org/P88364 and previous config saved to /var/cache/conftool/dbconfig/20260202-090328-marostegui.json

Change #1235741 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2249: Disable notifications

https://gerrit.wikimedia.org/r/1235741

Change #1235741 merged by Marostegui:

[operations/puppet@production] db2249: Disable notifications

https://gerrit.wikimedia.org/r/1235741

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2249.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2249.codfw.wmnet with OS trixie completed:

  • db2249 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602020924_marostegui_898222_db2249.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Starting pool of db2249 by marostegui@cumin1003: After reimage

Completed pooling of db2249 by marostegui@cumin1003: After reimage