- dbstore1007:3314 (T299481)
- db2147
- db2140
- db2139:3314 (T299876)
- db2138:3314
- db2137:3314
- db2136
- db2119
- db2110 (codfw primary)
- db2106
- db2099:3314 (T299876)
- db2095:3314
- db2090
- db2073
- db1160
- db1155:3314
- db1150:3314 (T299876)
- db1149
- db1148
- db1147
- db1146:3314
- db1145:3314 (T299876)
- db1144:3314
- db1143
- db1142
- db1141
- db1138 (eqiad primary)
- db1121
- clouddb1021:3314 (T299480)
- clouddb1019:3314 (T299480)
- clouddb1015:3314 (T299480)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | Ladsgroup | T302950 Upgrade s4 to bullseye | |||
Resolved | Marostegui | T304933 Switchover s4 master (db1138 -> db1160) |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2022-03-03T16:51:16Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-03T17:36:31Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json
Change 768054 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1144: Disable notifications
Change 767797 merged by Ladsgroup:
[operations/puppet@production] db1147: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-07T05:15:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1147.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1147.eqiad.wmnet with OS bullseye completed:
- db1147 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203070521_ladsgroup_24887_db1147.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-07T06:27:13Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:12:27Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
Change 768054 merged by Ladsgroup:
[operations/puppet@production] db1144: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:24:53Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T07:26:25Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1144.eqiad.wmnet with OS bullseye
Change 768649 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1143: Disable notifications
Change 768650 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1142: Disable notifications
Change 768651 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1141: Disable notifications
Change 768652 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1125: Disable notifications
Change 768653 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1124: Disable notifications
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1144.eqiad.wmnet with OS bullseye completed:
- db1144 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203070732_ladsgroup_11785_db1144.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-07T08:45:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
Change 768658 had a related patch set uploaded (by Gerrit maintenance bot; author: Gerrit maintenance bot):
[operations/puppet@production] db1121: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-07T09:30:32Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T09:36:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
Change 768649 merged by Ladsgroup:
[operations/puppet@production] db1143: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-07T10:21:30Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T10:21:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1143.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1143.eqiad.wmnet with OS bullseye completed:
- db1143 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203071031_ladsgroup_9281_db1143.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-07T11:22:08Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
Change 768650 merged by Ladsgroup:
[operations/puppet@production] db1142: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-07T12:07:22Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T12:48:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1142.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1142.eqiad.wmnet with OS bullseye completed:
- db1142 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203071252_ladsgroup_8825_db1142.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-07T13:47:15Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-07T14:32:29Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
Change 768651 merged by Ladsgroup:
[operations/puppet@production] db1141: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-10T11:36:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1141.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1141.eqiad.wmnet with OS bullseye completed:
- db1141 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203101145_ladsgroup_217108_db1141.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-10T12:26:59Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T13:12:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T13:22:34Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T14:22:48Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T15:08:04Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T15:08:39Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1121.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1121.eqiad.wmnet with OS bullseye completed:
- db1121 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203101527_ladsgroup_247698_db1121.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-10T16:04:59Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-10T16:50:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2110.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2110.codfw.wmnet with OS bullseye completed:
- db2110 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203111016_ladsgroup_395784_db2110.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 779331 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1138: Disable notifications
Change 779331 merged by Marostegui:
[operations/puppet@production] db1138: Disable notifications
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1138.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1138.eqiad.wmnet with OS bullseye completed:
- db1138 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204120642_marostegui_1592768_db1138.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 768653 abandoned by Ladsgroup:
[operations/puppet@production] db1124: Disable notifications
Reason:
not needed
Change 768652 abandoned by Ladsgroup:
[operations/puppet@production] db1125: Disable notifications
Reason:
We did the reimage, long time ago. Let's just abandon these.
Change 768658 abandoned by Ladsgroup:
[operations/puppet@production] db1121: Disable notifications
Reason: