Page MenuHomePhabricator

Install a testing db with Debian Trixie
Closed, ResolvedPublic

Description

Probably let's start with test-s4 cluster host to start testing the initial run.
Initially with MariaDB 10.11

Related Objects

Event Timeline

Marostegui triaged this task as Medium priority.Oct 16 2025, 8:37 AM
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)

Change #1196628 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Define mariadb packages for trixie

https://gerrit.wikimedia.org/r/1196628

Change #1196628 merged by Marostegui:

[operations/puppet@production] mariadb: Define mariadb packages for trixie

https://gerrit.wikimedia.org/r/1196628

Change #1196893 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] packages_wmf,packages_client.pp: Add trixie

https://gerrit.wikimedia.org/r/1196893

Change #1196893 merged by Marostegui:

[operations/puppet@production] packages_wmf,packages_client.pp: Add trixie

https://gerrit.wikimedia.org/r/1196893

Mentioned in SAL (#wikimedia-operations) [2025-10-20T07:28:33Z] <marostegui> Stop MariaDB on es2032 to clone sretest2003 T407472

This comment was removed by Marostegui.

Mentioned in SAL (#wikimedia-operations) [2025-10-20T07:28:33Z] <marostegui> Stop MariaDB on es2032 to clone sretest2003 T407472

Ignore this - wrong task

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db-test1003.eqiad.wmnet with OS trixie

db-test1003 installed with trixie and mariadb 10.11
I will install a vanilla mariadb database just to do a few tests there with puppet, configuration etc.

Change #1197629 had a related patch set uploaded (by Federico Ceratto; author: Federico Ceratto):

[operations/puppet@production] aptrepo: enable wmfmariadbpy for Trixie

https://gerrit.wikimedia.org/r/1197629

Change #1197629 merged by Federico Ceratto:

[operations/puppet@production] aptrepo: enable wmfmariadbpy for Trixie

https://gerrit.wikimedia.org/r/1197629

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db-test1003.eqiad.wmnet with OS trixie executed with errors:

  • db-test1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510210923_marostegui_1835076_db-test1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db-test1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db-test1003.eqiad.wmnet with OS trixie executed with errors:

  • db-test1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510210923_marostegui_1835076_db-test1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db-test1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

This was due to T407845 but the reimage in general went fine, it is just that puppet issue

Puppet works fine now as T407845 was resolved.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Change #1200049 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] installserver: Format /srv/ in es2028

https://gerrit.wikimedia.org/r/1200049

Change #1200049 merged by Marostegui:

[operations/puppet@production] installserver: Format /srv/ in es2028

https://gerrit.wikimedia.org/r/1200049

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:

  • es2028 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510301203_marostegui_1349427_es2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie completed:

  • es2028 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510301320_marostegui_1359934_es2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

mariadb started on es1033 and es2028 (running trixie)

Change #1201996 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] instances.yaml: Add es1033 to dbctl

https://gerrit.wikimedia.org/r/1201996

Change #1201996 merged by Marostegui:

[operations/puppet@production] instances.yaml: Add es1033 to dbctl

https://gerrit.wikimedia.org/r/1201996

Mentioned in SAL (#wikimedia-operations) [2025-11-05T07:16:06Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Add es1033 to es2 depooled T409257 T407472', diff saved to https://phabricator.wikimedia.org/P84834 and previous config saved to /var/cache/conftool/dbconfig/20251105-071605-marostegui.json

Change #1202002 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1033: Enable notifications

https://gerrit.wikimedia.org/r/1202002

Change #1202002 merged by Marostegui:

[operations/puppet@production] es1033: Enable notifications

https://gerrit.wikimedia.org/r/1202002

es1033 (es) is starting to serve minimal traffic in es2

Change #1218652 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] isntallserver: Do not format /srv on es2028

https://gerrit.wikimedia.org/r/1218652

Change #1218652 merged by Marostegui:

[operations/puppet@production] isntallserver: Do not format /srv on es2028

https://gerrit.wikimedia.org/r/1218652

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:

  • es2028 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:

  • es2028 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:

  • es2028 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:

  • es2028 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie completed:

  • es2028 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512220709_marostegui_3563158_es2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

db1169 is running Trixie.