- dbstore1005:3318 (T299481)
- db2152
- db2100:3318 (T299876)
- db2098:3318 (T299876)
- db2094:3318
- db2091
- db2086:3318
- db2085:3318
- db2084
- db2083
- db2082
- db2081
- db2080
- db2079 (codfw primary)
- db1178
- db1177
- db1172
- db1171:3318 (T299876)
- db1167
- db1154:3318
- db1126
- db1116:3318 (T299876)
- db1114
- db1111
- db1109 (eqiad primary)
- db1104
- db1101:3318
- db1099:3318
- clouddb1021:3318 (T299480)
- clouddb1020:3318 (T299480)
- clouddb1016:3318 (T299480)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | Ladsgroup | T302185 Upgrade s8 to Bullseye | |||
Resolved | Ladsgroup | T303927 Switchover s8 master (db1109 -> db1104) |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2022-02-28T08:53:32Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21571 and previous config saved to /var/cache/conftool/dbconfig/20220228-085329-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1126.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1126.eqiad.wmnet with OS bullseye completed:
- db1126 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202280900_ladsgroup_2641_db1126.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-28T09:32:13Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21575 and previous config saved to /var/cache/conftool/dbconfig/20220228-093212-ladsgroup.json
Change 766582 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1114: Disable notifications
Change 766582 merged by Ladsgroup:
[operations/puppet@production] db1114: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:17:26Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21580 and previous config saved to /var/cache/conftool/dbconfig/20220228-101726-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:18:18Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21581 and previous config saved to /var/cache/conftool/dbconfig/20220228-101815-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1114.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1114.eqiad.wmnet with OS bullseye completed:
- db1114 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202281023_ladsgroup_1896_db1114.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-28T10:57:17Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21585 and previous config saved to /var/cache/conftool/dbconfig/20220228-105716-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-28T11:42:31Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21590 and previous config saved to /var/cache/conftool/dbconfig/20220228-114230-ladsgroup.json
Change 766600 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1111: Disable notifications
Change 766600 merged by Ladsgroup:
[operations/puppet@production] db1111: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-02-28T12:30:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21594 and previous config saved to /var/cache/conftool/dbconfig/20220228-123008-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1111.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1111.eqiad.wmnet with OS bullseye completed:
- db1111 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202281235_ladsgroup_13835_db1111.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-02-28T13:06:44Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21597 and previous config saved to /var/cache/conftool/dbconfig/20220228-130644-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-02-28T13:51:58Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21600 and previous config saved to /var/cache/conftool/dbconfig/20220228-135158-ladsgroup.json
Change 766772 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1104: Disable notifications
Change 766772 merged by Ladsgroup:
[operations/puppet@production] db1104: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-01T01:14:05Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-01T02:14:24Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-01T02:59:38Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T03:45:03Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1104.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1104.eqiad.wmnet with OS bullseye completed:
- db1104 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020348_ladsgroup_501_db1104.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-02T04:20:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
Change 767279 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1101: Disable notifications
Change 767279 merged by Ladsgroup:
[operations/puppet@production] db1101: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:05:26Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:18:53Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:20:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1101.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1101.eqiad.wmnet with OS bullseye completed:
- db1101 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020523_ladsgroup_16570_db1101.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-02T05:54:19Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T06:39:33Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T06:50:56Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
Change 767440 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1167: Disable notifications
Change 767440 merged by Ladsgroup:
[operations/puppet@production] db1167: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-03-02T07:36:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T07:42:11Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1167.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1167.eqiad.wmnet with OS bullseye completed:
- db1167 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203020809_ladsgroup_14063_db1167.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-03-02T08:45:14Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-03-02T09:30:28Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
Change 786176 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] db1109: Disable notifications
Change 786176 merged by Ladsgroup:
[operations/puppet@production] db1109: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-04-26T06:15:19Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26505 and previous config saved to /var/cache/conftool/dbconfig/20220426-061519-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host db1109.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host db1109.eqiad.wmnet with OS bullseye completed:
- db1109 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204260620_ladsgroup_2901832_db1109.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-04-26T06:51:12Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26506 and previous config saved to /var/cache/conftool/dbconfig/20220426-065112-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-04-26T07:36:27Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1109 (T302185)', diff saved to https://phabricator.wikimedia.org/P26509 and previous config saved to /var/cache/conftool/dbconfig/20220426-073627-ladsgroup.json