Page MenuHomePhabricator

Upgrade es3 to MariaDB 10.6
Closed, ResolvedPublic

Description

  • es1034
  • es1028
  • es1031
  • es2034
  • es2027
  • es2029

Event Timeline

Marostegui triaged this task as Medium priority.Feb 22 2024, 8:14 AM
Marostegui created this task.
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2024-02-22T11:09:15Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1028 T358180', diff saved to https://phabricator.wikimedia.org/P57694 and previous config saved to /var/cache/conftool/dbconfig/20240222-110914-root.json

Change 1005724 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1028: Disable notifications

https://gerrit.wikimedia.org/r/1005724

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1028.eqiad.wmnet with OS bookworm

Change 1005724 merged by Marostegui:

[operations/puppet@production] es1028: Disable notifications

https://gerrit.wikimedia.org/r/1005724

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1028.eqiad.wmnet with OS bookworm completed:

  • es1028 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402221129_marostegui_1449437_es1028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-23T07:19:53Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1031 T358180', diff saved to https://phabricator.wikimedia.org/P57802 and previous config saved to /var/cache/conftool/dbconfig/20240223-071952-root.json

Change 1005869 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1031: Disable notifications

https://gerrit.wikimedia.org/r/1005869

Change 1005869 merged by Marostegui:

[operations/puppet@production] es1031: Disable notifications

https://gerrit.wikimedia.org/r/1005869

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1031.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1031.eqiad.wmnet with OS bookworm completed:

  • es1031 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402230744_marostegui_1626838_es1031.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-27T06:37:07Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es2029 T358180', diff saved to https://phabricator.wikimedia.org/P57988 and previous config saved to /var/cache/conftool/dbconfig/20240227-063707-root.json

Change 1006749 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2029: Disable notifications

https://gerrit.wikimedia.org/r/1006749

Change 1006749 merged by Marostegui:

[operations/puppet@production] es2029: Disable notifications

https://gerrit.wikimedia.org/r/1006749

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2029.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2029.codfw.wmnet with OS bookworm completed:

  • es2029 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402270702_marostegui_2401751_es2029.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-27T07:20:45Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 1%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57989 and previous config saved to /var/cache/conftool/dbconfig/20240227-072044-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T07:35:49Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57990 and previous config saved to /var/cache/conftool/dbconfig/20240227-073549-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T07:50:54Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57991 and previous config saved to /var/cache/conftool/dbconfig/20240227-075054-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T08:05:59Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57992 and previous config saved to /var/cache/conftool/dbconfig/20240227-080559-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T08:36:13Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57994 and previous config saved to /var/cache/conftool/dbconfig/20240227-083608-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-27T08:51:17Z] <marostegui@cumin1002> dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57995 and previous config saved to /var/cache/conftool/dbconfig/20240227-085113-root.json

Mentioned in SAL (#wikimedia-operations) [2024-02-28T06:47:31Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es2027 T358180', diff saved to https://phabricator.wikimedia.org/P58015 and previous config saved to /var/cache/conftool/dbconfig/20240228-064731-root.json

Change 1007200 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2027: Disable notifications

https://gerrit.wikimedia.org/r/1007200

Change 1007200 merged by Marostegui:

[operations/puppet@production] es2027: Disable notifications

https://gerrit.wikimedia.org/r/1007200

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2027.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2027.codfw.wmnet with OS bookworm completed:

  • es2027 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402280711_marostegui_3430447_es2027.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Mentioned in SAL (#wikimedia-operations) [2024-02-28T17:01:34Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote es1028 to es3 eqiad master T358180', diff saved to https://phabricator.wikimedia.org/P58121 and previous config saved to /var/cache/conftool/dbconfig/20240228-170134-marostegui.json

es1028 promoted to master (it is a NOOP)

Mentioned in SAL (#wikimedia-operations) [2024-02-29T06:35:02Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1034 T358180', diff saved to https://phabricator.wikimedia.org/P58174 and previous config saved to /var/cache/conftool/dbconfig/20240229-063502-root.json

Change 1007483 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1034: Disable notifications

https://gerrit.wikimedia.org/r/1007483

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1034.eqiad.wmnet with OS bookworm

Change 1007483 merged by Marostegui:

[operations/puppet@production] es1034: Disable notifications

https://gerrit.wikimedia.org/r/1007483

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1034.eqiad.wmnet with OS bookworm completed:

  • es1034 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402290652_marostegui_3612817_es1034.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-29T07:35:26Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote back es1034 to es3 eqiad master T358180', diff saved to https://phabricator.wikimedia.org/P58181 and previous config saved to /var/cache/conftool/dbconfig/20240229-073523-marostegui.json

es1028 promoted to master (it is a NOOP)

Reverted as es1034 was reimaged.

Mentioned in SAL (#wikimedia-operations) [2024-02-29T08:45:03Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote back es2029 to es3 codfw master T358180', diff saved to https://phabricator.wikimedia.org/P58193 and previous config saved to /var/cache/conftool/dbconfig/20240229-084502-marostegui.json

Temporary switched es3 codfw master to es2029

Mentioned in SAL (#wikimedia-operations) [2024-02-29T08:45:42Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es2034 T358180', diff saved to https://phabricator.wikimedia.org/P58194 and previous config saved to /var/cache/conftool/dbconfig/20240229-084541-root.json

Change 1007561 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2034: Disable notifications

https://gerrit.wikimedia.org/r/1007561

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2034.codfw.wmnet with OS bookworm

Change 1007561 merged by Marostegui:

[operations/puppet@production] es2034: Disable notifications

https://gerrit.wikimedia.org/r/1007561

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2034.codfw.wmnet with OS bookworm completed:

  • es2034 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402290908_marostegui_3631128_es2034.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-02-29T09:29:30Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Promote back es2034 to es3 codfw master T358180', diff saved to https://phabricator.wikimedia.org/P58203 and previous config saved to /var/cache/conftool/dbconfig/20240229-092929-marostegui.json