Page MenuHomePhabricator

Upgrade all dbproxy hosts to Bullseye
Closed, ResolvedPublic

Description

haproxy has been successfully tested at T295965: Test MariaDB 10.4 with Bullseye

  • dbproxy1012 (m1 - OLD primary)
  • dbproxy1013 (m2 primary since T298586#7611768)
  • dbproxy1014 (m1 primary since T298586#7608033)
  • dbproxy1015 (m2 - OLD primary)
  • dbproxy1016 (m3 - OLD primary)
  • dbproxy1017 (m5 - OLD primary)
  • dbproxy1018 to be handled by WMCS (T298940)
  • dbproxy1019 to be handled by WMCS (T298940)
  • dbproxy1020 (m3 primary since T298586#7618776)
  • dbproxy1021 (m5 primary since T298586#7621698)
  • dbproxy2001
  • dbproxy2002
  • dbproxy2003
  • dbproxy2004

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 751702 merged by Marostegui:

[operations/puppet@production] dbproxy2003: Disable notifications

https://gerrit.wikimedia.org/r/751702

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy2003.codfw.wmnet with OS bullseye

Change 751707 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Allow reiage dbproxy2003

https://gerrit.wikimedia.org/r/751707

Change 751707 merged by Marostegui:

[operations/puppet@production] install_server: Allow reiage dbproxy2003

https://gerrit.wikimedia.org/r/751707

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy2003.codfw.wmnet with OS bullseye executed with errors:

  • dbproxy2003 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy2003.codfw.wmnet with OS bullseye completed:

  • dbproxy2003 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201051124_marostegui_6228_dbproxy2003.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Marostegui renamed this task from Upgrade all dbproxies to Bullseye to Upgrade all dbproxy hosts to Bullseye.Jan 5 2022, 12:11 PM

Change 751728 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Allow dbproxy2* reimage

https://gerrit.wikimedia.org/r/751728

Change 751728 merged by Marostegui:

[operations/puppet@production] install_server: Allow dbproxy2* reimage

https://gerrit.wikimedia.org/r/751728

Change 751735 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy200[1,2]: Disable notifications

https://gerrit.wikimedia.org/r/751735

Change 751735 merged by Marostegui:

[operations/puppet@production] dbproxy200[1,2]: Disable notifications

https://gerrit.wikimedia.org/r/751735

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy2002.codfw.wmnet with OS bullseye completed:

  • dbproxy2002 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201051238_marostegui_23845_dbproxy2002.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy2001.codfw.wmnet with OS bullseye completed:

  • dbproxy2001 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201051311_marostegui_30408_dbproxy2001.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

All dbproxy* in codfw are now running Bullseye. Going to wait till Monday (I am off tomorrow) before proceeding with eqiad stand by hosts.

Change 752375 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1013: Disable notifications

https://gerrit.wikimedia.org/r/752375

Change 752375 merged by Marostegui:

[operations/puppet@production] dbproxy1013: Disable notifications

https://gerrit.wikimedia.org/r/752375

Change 752403 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Allow reimage for dbproxy1* hosts

https://gerrit.wikimedia.org/r/752403

Change 752403 merged by Marostegui:

[operations/puppet@production] install_server: Allow reimage for dbproxy1* hosts

https://gerrit.wikimedia.org/r/752403

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1013.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1013.eqiad.wmnet with OS bullseye completed:

  • dbproxy1013 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201100558_marostegui_27779_dbproxy1013.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 752528 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1014: Disable notifications

https://gerrit.wikimedia.org/r/752528

Change 752528 merged by Marostegui:

[operations/puppet@production] dbproxy1014: Disable notifications

https://gerrit.wikimedia.org/r/752528

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1014.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1014.eqiad.wmnet with OS bullseye completed:

  • dbproxy1014 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201100629_marostegui_2057_dbproxy1014.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

I have reimaged m1 and m2 standby proxies to Bullseye. Going to prepare now a failover for the m1 one (from dbproxy1012 to dbproxy1014) so we can have an active Bullseye proxy to make sure all goes fine before going to reimage all the standby hosts.

Change 752532 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Failover m1 proxy from dbproxy1012 to dbproxy104

https://gerrit.wikimedia.org/r/752532

Change 752532 merged by Marostegui:

[operations/dns@master] wmnet: Failover m1 proxy from dbproxy1012 to dbproxy104

https://gerrit.wikimedia.org/r/752532

Mentioned in SAL (#wikimedia-operations) [2022-01-10T07:16:37Z] <marostegui> Failover m1 proxy from dbproxy1012 to dbproxy1014 T298586

I have reimaged m1 and m2 standby proxies to Bullseye. Going to prepare now a failover for the m1 one (from dbproxy1012 to dbproxy1014) so we can have an active Bullseye proxy to make sure all goes fine before going to reimage all the standby hosts.

All the connections have moved to the Bullseye proxy already - so far so good. I am going to wait until tomorrow before continuing remaining proxies.

Change 752934 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1012: Disable notifications

https://gerrit.wikimedia.org/r/752934

Change 752934 merged by Marostegui:

[operations/puppet@production] dbproxy1012: Disable notifications

https://gerrit.wikimedia.org/r/752934

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1012.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1012.eqiad.wmnet with OS bullseye completed:

  • dbproxy1012 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201110600_marostegui_16355_dbproxy1012.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 752936 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Failover m2 master to dbproxy1013

https://gerrit.wikimedia.org/r/752936

Mentioned in SAL (#wikimedia-operations) [2022-01-11T07:07:37Z] <marostegui> Failover m2 proxy from dbproxy1015 to dbproxy1013 T298586

Change 752936 merged by Marostegui:

[operations/dns@master] wmnet: Failover m2 master to dbproxy1013

https://gerrit.wikimedia.org/r/752936

m2 failed over to dbproxy1013. Once 24h has passed by, I will reimage dbproxy1015 (now standby)

Change 752989 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1020: Disable notifications

https://gerrit.wikimedia.org/r/752989

Change 752989 merged by Marostegui:

[operations/puppet@production] dbproxy1020: Disable notifications

https://gerrit.wikimedia.org/r/752989

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1020.eqiad.wmnet with OS bullseye completed:

  • dbproxy1020 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201110755_marostegui_5347_dbproxy1020.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 753041 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1021: Disable notifications

https://gerrit.wikimedia.org/r/753041

Change 753041 merged by Marostegui:

[operations/puppet@production] dbproxy1021: Disable notifications

https://gerrit.wikimedia.org/r/753041

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1021.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1021.eqiad.wmnet with OS bullseye completed:

  • dbproxy1021 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201111350_marostegui_20439_dbproxy1021.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 753616 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Failover m3-master to dbproxy1020

https://gerrit.wikimedia.org/r/753616

Change 753616 merged by Marostegui:

[operations/dns@master] wmnet: Failover m3-master to dbproxy1020

https://gerrit.wikimedia.org/r/753616

Mentioned in SAL (#wikimedia-operations) [2022-01-13T06:38:14Z] <marostegui> Failover m3 proxy from dbproxy1016 to dbproxy1020 T298586

m3 proxy has been failed over from dbproxy1016 to dbproxy1020

Change 753617 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1015: Reimage to Bullseye

https://gerrit.wikimedia.org/r/753617

Change 753617 merged by Marostegui:

[operations/puppet@production] dbproxy1015: Reimage to Bullseye

https://gerrit.wikimedia.org/r/753617

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1015.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1015.eqiad.wmnet with OS bullseye completed:

  • dbproxy1015 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201130703_marostegui_29700_dbproxy1015.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 753870 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Failover m5-master to dbproxy1021

https://gerrit.wikimedia.org/r/753870

Mentioned in SAL (#wikimedia-operations) [2022-01-14T06:15:47Z] <marostegui> Failover m5 proxy from dbproxy1017 to dbproxy1021 T298586

Change 753870 merged by Marostegui:

[operations/dns@master] wmnet: Failover m5-master to dbproxy1021

https://gerrit.wikimedia.org/r/753870

m5 proxy failed over to dbproxy1021. Normally m5 services take quite a while to move all the connections thru the new proxy, but that's not a problem as I won't reimage the "old" proxy today.

Change 754123 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1020: Enable notifications

https://gerrit.wikimedia.org/r/754123

Change 754123 merged by Marostegui:

[operations/puppet@production] dbproxy1020: Enable notifications

https://gerrit.wikimedia.org/r/754123

Change 754206 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1016: Disable notifications

https://gerrit.wikimedia.org/r/754206

Change 754206 merged by Marostegui:

[operations/puppet@production] dbproxy1016: Disable notifications

https://gerrit.wikimedia.org/r/754206

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1016.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1016.eqiad.wmnet with OS bullseye completed:

  • dbproxy1016 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201170557_marostegui_18943_dbproxy1016.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 754452 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1017: Disable notifications

https://gerrit.wikimedia.org/r/754452

Change 754452 merged by Marostegui:

[operations/puppet@production] dbproxy1017: Disable notifications

https://gerrit.wikimedia.org/r/754452

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host dbproxy1017.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host dbproxy1017.eqiad.wmnet with OS bullseye completed:

  • dbproxy1017 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201170853_marostegui_14596_dbproxy1017.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

All proxies reimaged to Bullseye. Pending clouddb* ones, which are owned by WMCS and have their specific task: T298940: Reimage WMCS db proxies to Bullseye

Change 779915 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Use both dbproxy101[89] servers for both wikireplica services

https://gerrit.wikimedia.org/r/779915

Change 779915 merged by Razzi:

[operations/puppet@production] Use both dbproxy101[89] servers for both wikireplica services

https://gerrit.wikimedia.org/r/779915