Page MenuHomePhabricator

Upgrade s4 to bullseye
Closed, ResolvedPublic

Description

  • dbstore1007:3314 (T299481)
  • db2147
  • db2140
  • db2139:3314 (T299876)
  • db2138:3314
  • db2137:3314
  • db2136
  • db2119
  • db2110 (codfw primary)
  • db2106
  • db2099:3314 (T299876)
  • db2095:3314
  • db2090
  • db2073
  • db1160
  • db1155:3314
  • db1150:3314 (T299876)
  • db1149
  • db1148
  • db1147
  • db1146:3314
  • db1145:3314 (T299876)
  • db1144:3314
  • db1143
  • db1142
  • db1141
  • db1138 (eqiad primary)
  • db1121
  • clouddb1021:3314 (T299480)
  • clouddb1019:3314 (T299480)
  • clouddb1015:3314 (T299480)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2022-03-03T16:51:16Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-03T17:36:31Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json

Change 768054 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1144: Disable notifications

https://gerrit.wikimedia.org/r/768054

Change 767797 merged by Ladsgroup:

[operations/puppet@production] db1147: Disable notifications

https://gerrit.wikimedia.org/r/767797

Mentioned in SAL (#wikimedia-operations) [2022-03-07T05:15:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1147.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1147.eqiad.wmnet with OS bullseye completed:

  • db1147 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203070521_ladsgroup_24887_db1147.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-07T06:27:13Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:12:27Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json

Change 768054 merged by Ladsgroup:

[operations/puppet@production] db1144: Disable notifications

https://gerrit.wikimedia.org/r/768054

Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:24:53Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:26:25Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1144.eqiad.wmnet with OS bullseye

Change 768649 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1143: Disable notifications

https://gerrit.wikimedia.org/r/768649

Change 768650 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1142: Disable notifications

https://gerrit.wikimedia.org/r/768650

Change 768651 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1141: Disable notifications

https://gerrit.wikimedia.org/r/768651

Change 768652 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1125: Disable notifications

https://gerrit.wikimedia.org/r/768652

Change 768653 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1124: Disable notifications

https://gerrit.wikimedia.org/r/768653

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1144.eqiad.wmnet with OS bullseye completed:

  • db1144 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203070732_ladsgroup_11785_db1144.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-07T08:45:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json

Change 768658 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):

[operations/puppet@production] db1121: Disable notifications

https://gerrit.wikimedia.org/r/768658

Mentioned in SAL (#wikimedia-operations) [2022-03-07T09:30:32Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T09:36:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json

Change 768649 merged by Ladsgroup:

[operations/puppet@production] db1143: Disable notifications

https://gerrit.wikimedia.org/r/768649

Mentioned in SAL (#wikimedia-operations) [2022-03-07T10:21:30Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T10:21:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1143.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1143.eqiad.wmnet with OS bullseye completed:

  • db1143 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203071031_ladsgroup_9281_db1143.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-07T11:22:08Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json

Change 768650 merged by Ladsgroup:

[operations/puppet@production] db1142: Disable notifications

https://gerrit.wikimedia.org/r/768650

Mentioned in SAL (#wikimedia-operations) [2022-03-07T12:07:22Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T12:48:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1142.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1142.eqiad.wmnet with OS bullseye completed:

  • db1142 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203071252_ladsgroup_8825_db1142.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-07T13:47:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-07T14:32:29Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json

Change 768651 merged by Ladsgroup:

[operations/puppet@production] db1141: Disable notifications

https://gerrit.wikimedia.org/r/768651

Mentioned in SAL (#wikimedia-operations) [2022-03-10T11:36:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1141.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1141.eqiad.wmnet with OS bullseye completed:

  • db1141 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203101145_ladsgroup_217108_db1141.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-10T12:26:59Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T13:12:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T13:22:34Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T14:22:48Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T15:08:04Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T15:08:39Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1121.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1121.eqiad.wmnet with OS bullseye completed:

  • db1121 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203101527_ladsgroup_247698_db1121.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-03-10T16:04:59Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2022-03-10T16:50:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json

Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2110.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2110.codfw.wmnet with OS bullseye completed:

  • db2110 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203111016_ladsgroup_395784_db2110.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Ladsgroup moved this task from In progress to Blocked on the DBA board.

Change 779331 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1138: Disable notifications

https://gerrit.wikimedia.org/r/779331

Change 779331 merged by Marostegui:

[operations/puppet@production] db1138: Disable notifications

https://gerrit.wikimedia.org/r/779331

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1138.eqiad.wmnet with OS bullseye

Marostegui updated the task description. (Show Details)

Old master, db1138, reimaged.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1138.eqiad.wmnet with OS bullseye completed:

  • db1138 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204120642_marostegui_1592768_db1138.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 768653 abandoned by Ladsgroup:

[operations/puppet@production] db1124: Disable notifications

Reason:

not needed

https://gerrit.wikimedia.org/r/768653

Change 768652 abandoned by Ladsgroup:

[operations/puppet@production] db1125: Disable notifications

Reason:

We did the reimage, long time ago. Let's just abandon these.

https://gerrit.wikimedia.org/r/768652

Change 768658 abandoned by Ladsgroup:

[operations/puppet@production] db1121: Disable notifications

Reason:

https://gerrit.wikimedia.org/r/768658