Page MenuHomePhabricator

Upgrade s8 to Bullseye
Closed, ResolvedPublic

Description

  • dbstore1005:3318 (T299481)
  • db2152
  • db2100:3318 (T299876)
  • db2098:3318 (T299876)
  • db2094:3318
  • db2091
  • db2086:3318
  • db2085:3318
  • db2084
  • db2083
  • db2082
  • db2081
  • db2080
  • db2079 (codfw primary)
  • db1178
  • db1177
  • db1172
  • db1171:3318 (T299876)
  • db1167
  • db1154:3318
  • db1126
  • db1116:3318 (T299876)
  • db1114
  • db1111
  • db1109 (eqiad primary)
  • db1104
  • db1101:3318
  • db1099:3318
  • clouddb1021:3318 (T299480)
  • clouddb1020:3318 (T299480)
  • clouddb1016:3318 (T299480)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2022-02-28T08:53:32Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21571 and previous config saved to /var/cache/conftool/dbconfig/20220228-085329-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1126.eqiad.wmnet with OS bullseye

db1172 is on bullseye but I don't remember upgrading it :/ let me dig.

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1126.eqiad.wmnet with OS bullseye completed:

  • db1126 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202280900_ladsgroup_2641_db1126.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-28T09:32:13Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21575 and previous config saved to /var/cache/conftool/dbconfig/20220228-093212-ladsgroup.json

Change 766582 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1114: Disable notifications

https://gerrit.wikimedia.org/r/766582

Change 766582 merged by Ladsgroup:

[operations/puppet@production] db1114: Disable notifications

https://gerrit.wikimedia.org/r/766582

Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:17:26Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21580 and previous config saved to /var/cache/conftool/dbconfig/20220228-101726-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:18:18Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21581 and previous config saved to /var/cache/conftool/dbconfig/20220228-101815-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1114.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1114.eqiad.wmnet with OS bullseye completed:

  • db1114 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202281023_ladsgroup_1896_db1114.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:57:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21585 and previous config saved to /var/cache/conftool/dbconfig/20220228-105716-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-28T11:42:31Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21590 and previous config saved to /var/cache/conftool/dbconfig/20220228-114230-ladsgroup.json

Change 766600 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1111: Disable notifications

https://gerrit.wikimedia.org/r/766600

Change 766600 merged by Ladsgroup:

[operations/puppet@production] db1111: Disable notifications

https://gerrit.wikimedia.org/r/766600

Mentioned in SAL (#wikimedia-operations) [2022-02-28T12:30:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21594 and previous config saved to /var/cache/conftool/dbconfig/20220228-123008-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1111.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1111.eqiad.wmnet with OS bullseye completed:

  • db1111 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202281235_ladsgroup_13835_db1111.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-28T13:06:44Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21597 and previous config saved to /var/cache/conftool/dbconfig/20220228-130644-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-02-28T13:51:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21600 and previous config saved to /var/cache/conftool/dbconfig/20220228-135158-ladsgroup.json

Change 766772 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1104: Disable notifications

https://gerrit.wikimedia.org/r/766772

Change 766772 merged by Ladsgroup:

[operations/puppet@production] db1104: Disable notifications

https://gerrit.wikimedia.org/r/766772

Mentioned in SAL (#wikimedia-operations) [2022-03-01T01:14:05Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-01T02:14:24Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-01T02:59:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T03:45:03Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1104.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1104.eqiad.wmnet with OS bullseye completed:

  • db1104 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020348_ladsgroup_501_db1104.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-02T04:20:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json

Change 767279 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1101: Disable notifications

https://gerrit.wikimedia.org/r/767279

Change 767279 merged by Ladsgroup:

[operations/puppet@production] db1101: Disable notifications

https://gerrit.wikimedia.org/r/767279

Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:05:26Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:18:53Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:20:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1101.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1101.eqiad.wmnet with OS bullseye completed:

  • db1101 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020523_ladsgroup_16570_db1101.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:54:19Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T06:39:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T06:50:56Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json

Change 767440 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1167: Disable notifications

https://gerrit.wikimedia.org/r/767440

Change 767440 merged by Ladsgroup:

[operations/puppet@production] db1167: Disable notifications

https://gerrit.wikimedia.org/r/767440

Mentioned in SAL (#wikimedia-operations) [2022-03-02T07:36:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-02T07:42:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1167.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1167.eqiad.wmnet with OS bullseye completed:

  • db1167 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020809_ladsgroup_14063_db1167.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-02T08:45:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json

Ladsgroup moved this task from In progress to Blocked on the DBA board.

This is done except primary of eqiad.

Mentioned in SAL (#wikimedia-operations) [2022-03-02T09:30:28Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json

Change 786176 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] db1109: Disable notifications

https://gerrit.wikimedia.org/r/786176

Change 786176 merged by Ladsgroup:

[operations/puppet@production] db1109: Disable notifications

https://gerrit.wikimedia.org/r/786176

Mentioned in SAL (#wikimedia-operations) [2022-04-26T06:15:19Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26505 and previous config saved to /var/cache/conftool/dbconfig/20220426-061519-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1109.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1109.eqiad.wmnet with OS bullseye completed:

  • db1109 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204260620_ladsgroup_2901832_db1109.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Ladsgroup updated the task description. (Show Details)
Ladsgroup moved this task from In progress to Done on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2022-04-26T06:51:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26506 and previous config saved to /var/cache/conftool/dbconfig/20220426-065112-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-04-26T07:36:27Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26509 and previous config saved to /var/cache/conftool/dbconfig/20220426-073627-ladsgroup.json