- dbstore1003:3317 (T299481)
- db2150
- db2122
- db2121 (codfw primary)
- db2120
- db2118
- db2108
- db2098:3317 (T299876)
- db2095:3317
- db2087:3317
- db2086:3317
- db2077
- db1181
- db1174
- db1171:3317 (T299876)
- db1170:3317
- db1158
- db1155:3317
- db1136 (eqiad primary)
- db1127
- db1101:3317
- db1098:3317
- clouddb1021:3317 (T299480)
- clouddb1018:3317 (T299480)
- clouddb1014:3317 (T299480)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T31744 FlaggedRev installation (deployment) requests (tracking) | |||
Stalled | None | T143886 Activating Flagged revisions on ar.wikinews | |||
Stalled | None | T204354 Flagged Revisions for Vietnamese Wikipedia | |||
Stalled | None | T205145 Deploy FlaggedRevs on bn.wikibooks | |||
Stalled | None | T221933 Enable Flagged Revisions (for trial run purpose) at the Chinese Wikipedia | |||
Open | None | T185664 Code stewardship review: FlaggedRevs | |||
Resolved | Ladsgroup | T277883 Drop all low-use and unused features of FlaggedRevs to make it more maintainable | |||
Resolved | Ladsgroup | T300774 Drop fr_img_* columns | |||
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | Ladsgroup | T302363 Upgrade s7 to bullseye | |||
Resolved | Marostegui | T306001 Switchover s7 master (db1136 -> db1181) |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2022-02-23T08:13:41Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2108 (T302363)', diff saved to https://phabricator.wikimedia.org/P21337 and previous config saved to /var/cache/conftool/dbconfig/20220223-081338-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2108.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2108.codfw.wmnet with OS bullseye completed:
- db2108 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202230816_ladsgroup_5808_db2108.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T08:57:55Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2108 (T302363)', diff saved to https://phabricator.wikimedia.org/P21343 and previous config saved to /var/cache/conftool/dbconfig/20220223-085755-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-23T09:01:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2077 (T302363)', diff saved to https://phabricator.wikimedia.org/P21345 and previous config saved to /var/cache/conftool/dbconfig/20220223-090109-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2077.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2077.codfw.wmnet with OS bullseye completed:
- db2077 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202230903_ladsgroup_13739_db2077.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T09:46:56Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2077 (T302363)', diff saved to https://phabricator.wikimedia.org/P21351 and previous config saved to /var/cache/conftool/dbconfig/20220223-094655-ladsgroup.json
Change 765236 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1181: Disable notifications
Change 765236 merged by Ladsgroup:
[operations/puppet@production] db1181: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-23T11:05:43Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21359 and previous config saved to /var/cache/conftool/dbconfig/20220223-110540-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1181.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1181.eqiad.wmnet with OS bullseye completed:
- db1181 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202231117_ladsgroup_29172_db1181.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T11:52:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21365 and previous config saved to /var/cache/conftool/dbconfig/20220223-115233-ladsgroup.json
Change 765255 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1174: Disable notifications
Change 765255 merged by Ladsgroup:
[operations/puppet@production] db1174: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-23T12:37:47Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21374 and previous config saved to /var/cache/conftool/dbconfig/20220223-123747-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-23T12:40:32Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21375 and previous config saved to /var/cache/conftool/dbconfig/20220223-124027-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1174.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1174.eqiad.wmnet with OS bullseye completed:
- db1174 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202231244_ladsgroup_17864_db1174.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T13:38:59Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21385 and previous config saved to /var/cache/conftool/dbconfig/20220223-133858-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-23T14:24:13Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21392 and previous config saved to /var/cache/conftool/dbconfig/20220223-142413-ladsgroup.json
Change 765308 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1127: Disable notifications
Change 765308 merged by Ladsgroup:
[operations/puppet@production] db1127: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-23T16:44:56Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21403 and previous config saved to /var/cache/conftool/dbconfig/20220223-164453-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1127.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1127.eqiad.wmnet with OS bullseye completed:
- db1127 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202231648_ladsgroup_21330_db1127.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T17:22:07Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21404 and previous config saved to /var/cache/conftool/dbconfig/20220223-172206-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-23T18:07:22Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21408 and previous config saved to /var/cache/conftool/dbconfig/20220223-180722-ladsgroup.json
Change 765316 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1158: Disable notifications
Change 765316 merged by Ladsgroup:
[operations/puppet@production] db1158: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-23T18:13:56Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21409 and previous config saved to /var/cache/conftool/dbconfig/20220223-181350-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1158.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1158.eqiad.wmnet with OS bullseye completed:
- db1158 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202231818_ladsgroup_5104_db1158.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-23T18:57:41Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21410 and previous config saved to /var/cache/conftool/dbconfig/20220223-185740-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-23T19:42:55Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21414 and previous config saved to /var/cache/conftool/dbconfig/20220223-194254-ladsgroup.json
Change 765489 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db2079: Disable notifications
Change 765539 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db2121: Disable notifications
Change 765539 merged by Ladsgroup:
[operations/puppet@production] db2121: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-24T13:23:57Z] <Amir1> dbmaint on s7@codfw (T302363)
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db2121.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db2121.codfw.wmnet with OS bullseye completed:
- db2121 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202241326_ladsgroup_19047_db2121.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 784078 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1136: Disable notifications
Change 784078 merged by Marostegui:
[operations/puppet@production] db1136: Disable notifications
@Ladsgroup I will take care of db1136's reimage and close this task once done. I need to do lots of other maintenance to this host before the reimage.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1136.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1136.eqiad.wmnet with OS bullseye completed:
- db1136 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204191146_marostegui_3355307_db1136.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
- Failed to get Netbox script results, try manually: https://netbox.wikimedia.org/api/extras/job-results/2896452/
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1136.eqiad.wmnet with OS bullseye executed with errors:
- db1136 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204191146_marostegui_3355307_db1136.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
- Failed to get Netbox script results, try manually: https://netbox.wikimedia.org/api/extras/job-results/2896452/
- The reimage failed, see the cookbook logs for the details