Page MenuHomePhabricator

Upgrade s2 to Bullseye
Closed, ResolvedPublic

Description

  • dbstore1007:3312 (T299481)
  • db2148:3306
  • db2138:3312
  • db2126:3306
  • db2125:3306
  • db2107:3306
  • db2104:3306 (codfw master)
  • db2101:3312 (backup T299876)
  • db2095:3312
  • db2088:3312
  • db1182:3306
  • db1170:3312
  • db1162:3306
  • db1156:3306
  • db1155:3312
  • db1146:3312
  • db1139:3312 (backup T299876)
  • db1129:3306
  • db1122:3306 (master)
  • db1105:3312
  • db1102:3312 (backup T299876)
  • clouddb1021:3312 (T299480)
  • clouddb1018:3312 (T299480)
  • clouddb1014:3312 (T299480)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 761675 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db2088: Disable notifications

https://gerrit.wikimedia.org/r/761675

Change 761675 merged by Ladsgroup:

[operations/puppet@production] db2088: Disable notifications

https://gerrit.wikimedia.org/r/761675

Mentioned in SAL (#wikimedia-operations) [2022-02-10T17:39:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2088:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20558 and previous config saved to /var/cache/conftool/dbconfig/20220210-173932-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-10T17:39:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2088:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20559 and previous config saved to /var/cache/conftool/dbconfig/20220210-173957-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2088.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2088.codfw.wmnet with OS bullseye completed:

  • db2088 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202101740_ladsgroup_32617_db2088.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-10T18:25:48Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2088:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20567 and previous config saved to /var/cache/conftool/dbconfig/20220210-182547-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-10T18:31:08Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2088:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20570 and previous config saved to /var/cache/conftool/dbconfig/20220210-183107-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2104.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2104.codfw.wmnet with OS bullseye completed:

  • db2104 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202151145_ladsgroup_12094_db2104.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 762804 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1170: Disable notifications

https://gerrit.wikimedia.org/r/762804

Change 762804 merged by Ladsgroup:

[operations/puppet@production] db1170: Disable notifications

https://gerrit.wikimedia.org/r/762804

Mentioned in SAL (#wikimedia-operations) [2022-02-15T12:40:36Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20785 and previous config saved to /var/cache/conftool/dbconfig/20220215-124035-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-15T12:42:08Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20787 and previous config saved to /var/cache/conftool/dbconfig/20220215-124207-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1170.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1170.eqiad.wmnet with OS bullseye completed:

  • db1170 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202151243_ladsgroup_5262_db1170.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-15T13:28:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20790 and previous config saved to /var/cache/conftool/dbconfig/20220215-132857-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-15T14:14:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20797 and previous config saved to /var/cache/conftool/dbconfig/20220215-141411-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-15T14:25:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20800 and previous config saved to /var/cache/conftool/dbconfig/20220215-142511-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-15T15:10:26Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20808 and previous config saved to /var/cache/conftool/dbconfig/20220215-151026-ladsgroup.json

Change 762982 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1156: Disable notifications

https://gerrit.wikimedia.org/r/762982

Change 762982 merged by Ladsgroup:

[operations/puppet@production] db1156: Disable notifications

https://gerrit.wikimedia.org/r/762982

Mentioned in SAL (#wikimedia-operations) [2022-02-16T05:47:51Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20852 and previous config saved to /var/cache/conftool/dbconfig/20220216-054749-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1156.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1156.eqiad.wmnet with OS bullseye completed:

  • db1156 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202160551_ladsgroup_29873_db1156.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-16T06:26:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20853 and previous config saved to /var/cache/conftool/dbconfig/20220216-062610-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T07:11:25Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20856 and previous config saved to /var/cache/conftool/dbconfig/20220216-071125-ladsgroup.json

Change 763177 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1146: Disable notifications

https://gerrit.wikimedia.org/r/763177

Change 763177 merged by Ladsgroup:

[operations/puppet@production] db1146: Disable notifications

https://gerrit.wikimedia.org/r/763177

Mentioned in SAL (#wikimedia-operations) [2022-02-16T08:05:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20859 and previous config saved to /var/cache/conftool/dbconfig/20220216-080531-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T08:07:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20860 and previous config saved to /var/cache/conftool/dbconfig/20220216-080717-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T09:07:37Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20865 and previous config saved to /var/cache/conftool/dbconfig/20220216-090737-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T09:09:25Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'T300510', diff saved to https://phabricator.wikimedia.org/P20866 and previous config saved to /var/cache/conftool/dbconfig/20220216-090924-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1146.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1146.eqiad.wmnet with OS bullseye completed:

  • db1146 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202160923_ladsgroup_20285_db1146.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-16T10:23:03Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20872 and previous config saved to /var/cache/conftool/dbconfig/20220216-102302-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T11:08:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20879 and previous config saved to /var/cache/conftool/dbconfig/20220216-110816-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T11:21:45Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20881 and previous config saved to /var/cache/conftool/dbconfig/20220216-112145-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-16T12:07:00Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20891 and previous config saved to /var/cache/conftool/dbconfig/20220216-120659-ladsgroup.json

Change 763568 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1105: Disable notifications

https://gerrit.wikimedia.org/r/763568

Mentioned in SAL (#wikimedia-operations) [2022-02-17T17:25:06Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20992 and previous config saved to /var/cache/conftool/dbconfig/20220217-172504-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-17T17:26:50Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20993 and previous config saved to /var/cache/conftool/dbconfig/20220217-172650-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1105.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1105.eqiad.wmnet with OS bullseye completed:

  • db1105 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202171729_ladsgroup_29670_db1105.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-17T18:09:00Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20999 and previous config saved to /var/cache/conftool/dbconfig/20220217-180900-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-17T18:54:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P21004 and previous config saved to /var/cache/conftool/dbconfig/20220217-185414-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-17T19:07:48Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21006 and previous config saved to /var/cache/conftool/dbconfig/20220217-190748-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-17T19:53:02Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21009 and previous config saved to /var/cache/conftool/dbconfig/20220217-195302-ladsgroup.json

Ladsgroup changed the task status from Open to Stalled.Feb 17 2022, 7:59 PM
Ladsgroup updated the task description. (Show Details)
Ladsgroup moved this task from In progress to Blocked on the DBA board.

Only eqiad master left, pending switchover.

Ladsgroup changed the task status from Stalled to Open.Apr 26 2022, 6:50 AM
Ladsgroup moved this task from Blocked to In progress on the DBA board.

I will do it later today.

I will do it later today.

Talk to me first, I am finishing some pending schema changes

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1122.eqiad.wmnet with OS bullseye

Marostegui updated the task description. (Show Details)

Old s2 master, db1122, reimaged.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1122.eqiad.wmnet with OS bullseye completed:

  • db1122 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204260808_marostegui_2917823_db1122.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB