Page MenuHomePhabricator

Migrate es1 to Bookworm and MariaDB 10.6
Closed, ResolvedPublic

Description

Let's try to migrate es1 to Bookworm

  • es1032
  • es1029
  • es1027
  • es2028
  • es2030
  • es2032

Event Timeline

Marostegui triaged this task as Medium priority.Nov 24 2023, 6:47 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2023-11-28T12:52:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es2028 T351916', diff saved to https://phabricator.wikimedia.org/P53923 and previous config saved to /var/cache/conftool/dbconfig/20231128-125235-root.json

Change 978046 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2028: Disable notifications

https://gerrit.wikimedia.org/r/978046

Change 978046 merged by Marostegui:

[operations/puppet@production] es2028: Disable notifications

https://gerrit.wikimedia.org/r/978046

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es2028.codfw.wmnet with OS bookworm

Marostegui renamed this task from Migrate es1 to Bookworm to Migrate es1 to Bookworm and MariaDB 10.6.Nov 28 2023, 1:22 PM

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es2028.codfw.wmnet with OS bookworm completed:

  • es2028 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311281318_marostegui_960105_es2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2023-11-29T07:23:07Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es1027 T351916', diff saved to https://phabricator.wikimedia.org/P53931 and previous config saved to /var/cache/conftool/dbconfig/20231129-072306-root.json

Change 978368 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1027: Disable notifications

https://gerrit.wikimedia.org/r/978368

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host es1027.eqiad.wmnet with OS bookworm

Change 978368 merged by Marostegui:

[operations/puppet@production] es1027: Disable notifications

https://gerrit.wikimedia.org/r/978368

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host es1027.eqiad.wmnet with OS bookworm completed:

  • es1027 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311290740_marostegui_1466116_es1027.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-06T05:58:36Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1029 T351916', diff saved to https://phabricator.wikimedia.org/P56283 and previous config saved to /var/cache/conftool/dbconfig/20240206-055835-root.json

Change 997636 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1029: Disable notifications

https://gerrit.wikimedia.org/r/997636

Change 997636 merged by Marostegui:

[operations/puppet@production] es1029: Disable notifications

https://gerrit.wikimedia.org/r/997636

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bookworm executed with errors:

  • es1029 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bullseye

es1029 fails to install either Bullseye or Bookworm. First debconf fails and if configured manually it fails detecting disks.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bullseye executed with errors:

  • es1029 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1029.eqiad.wmnet with OS bookworm completed:

  • es1029 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402061213_marostegui_2416204_es1029.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 998008 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2030: Disable notifications

https://gerrit.wikimedia.org/r/998008

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2030.codfw.wmnet with OS bookworm

Change 998008 merged by Marostegui:

[operations/puppet@production] es2030: Disable notifications

https://gerrit.wikimedia.org/r/998008

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2030.codfw.wmnet with OS bookworm completed:

  • es2030 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402070617_marostegui_2616063_es2030.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-07T06:36:59Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Switch es1 master T351916', diff saved to https://phabricator.wikimedia.org/P56385 and previous config saved to /var/cache/conftool/dbconfig/20240207-063659-marostegui.json

Change 998145 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1032: Disable notifications

https://gerrit.wikimedia.org/r/998145

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1032.eqiad.wmnet with OS bookworm

Change 998145 merged by Marostegui:

[operations/puppet@production] es1032: Disable notifications

https://gerrit.wikimedia.org/r/998145

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1032.eqiad.wmnet with OS bookworm completed:

  • es1032 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402070657_marostegui_2626645_es1032.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-08T06:02:05Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote es2020 to es1 primary T351916', diff saved to https://phabricator.wikimedia.org/P56489 and previous config saved to /var/cache/conftool/dbconfig/20240208-060204-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-08T06:02:28Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es2032 T351916', diff saved to https://phabricator.wikimedia.org/P56490 and previous config saved to /var/cache/conftool/dbconfig/20240208-060226-root.json

Change 998658 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2032: Disable notifications

https://gerrit.wikimedia.org/r/998658

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2032.codfw.wmnet with OS bookworm

Change 998658 merged by Marostegui:

[operations/puppet@production] es2032: Disable notifications

https://gerrit.wikimedia.org/r/998658

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2032.codfw.wmnet with OS bookworm completed:

  • es2032 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402080623_marostegui_2842160_es2032.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-08T06:56:07Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote es2032 back to es1 primary T351916', diff saved to https://phabricator.wikimedia.org/P56496 and previous config saved to /var/cache/conftool/dbconfig/20240208-065607-root.json

All done - host being repooled automatically.