- es2034
- es2029
- es2027
- es1034
- es1031
- es1028
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T291916 Tracking task for Bullseye migrations in production | |||
Resolved | Marostegui | T298585 Upgrade WMF database-and-backup-related hosts to bullseye | |||
Resolved | • Ladsgroup | T299911 Upgrade es3 to Bullseye |
Event Timeline
Change 756565 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es2034: Disable notifications
Change 756586 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es2029: Disable notifications
Change 756565 merged by Ladsgroup:
[operations/puppet@production] es2034: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-24T13:06:00Z] <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reimage for upgrade - T299911
Icinga downtime set by ladsgroup@cumin1001 for 1 day, 0:00:00 1 host(s) and their services with reason: reimage for upgrade - T299911
es2034.codfw.wmnet
Mentioned in SAL (#wikimedia-operations) [2022-01-24T13:06:03Z] <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reimage for upgrade - T299911
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es2034.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es2034.codfw.wmnet with OS bullseye executed with errors:
- es2034 (FAIL)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es2034.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es2034.codfw.wmnet with OS bullseye completed:
- es2034 (WARN)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201241400_ladsgroup_1151_es2034.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 756586 merged by Ladsgroup:
[operations/puppet@production] es2029: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-25T10:00:36Z] <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - T299911
Icinga downtime set by ladsgroup@cumin1001 for 1 day, 0:00:00 1 host(s) and their services with reason: reimage for upgrade - T299911
es2029.codfw.wmnet
Mentioned in SAL (#wikimedia-operations) [2022-01-25T10:00:45Z] <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - T299911
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es2029.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es2029.codfw.wmnet with OS bullseye completed:
- es2029 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251001_ladsgroup_9377_es2029.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 756960 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es2027: Disable notifications
Change 756960 merged by Ladsgroup:
[operations/puppet@production] es2027: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-25T10:52:55Z] <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - T299911
Icinga downtime set by ladsgroup@cumin1001 for 1 day, 0:00:00 1 host(s) and their services with reason: reimage for upgrade - T299911
es2027.codfw.wmnet
Mentioned in SAL (#wikimedia-operations) [2022-01-25T10:52:59Z] <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - T299911
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es2027.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es2027.codfw.wmnet with OS bullseye completed:
- es2027 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251053_ladsgroup_26115_es2027.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-25T12:33:03Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depool es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19136 and previous config saved to /var/cache/conftool/dbconfig/20220125-123303-ladsgroup.json
Change 756971 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es1031: Disable notifications
Change 756971 merged by Ladsgroup:
[operations/puppet@production] es1031: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-25T13:06:33Z] <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - T299911
Icinga downtime set by ladsgroup@cumin1001 for 1 day, 0:00:00 1 host(s) and their services with reason: reimage for upgrade - T299911
es1031.eqiad.wmnet
Mentioned in SAL (#wikimedia-operations) [2022-01-25T13:06:37Z] <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - T299911
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es1031.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es1031.eqiad.wmnet with OS bullseye completed:
- es1031 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251324_ladsgroup_5946_es1031.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-25T14:15:39Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19163 and previous config saved to /var/cache/conftool/dbconfig/20220125-141538-ladsgroup.json
Change 757008 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es1034: Disable notifications
Change 757008 merged by Ladsgroup:
[operations/puppet@production] es1034: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-25T15:00:53Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19175 and previous config saved to /var/cache/conftool/dbconfig/20220125-150052-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-01-25T15:02:57Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19176 and previous config saved to /var/cache/conftool/dbconfig/20220125-150256-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es1034.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es1034.eqiad.wmnet with OS bullseye completed:
- es1034 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251520_ladsgroup_28989_es1034.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-25T15:56:04Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19193 and previous config saved to /var/cache/conftool/dbconfig/20220125-155604-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-01-25T16:41:19Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19204 and previous config saved to /var/cache/conftool/dbconfig/20220125-164118-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-01-25T16:43:25Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Make es1031 master of es3 T299911', diff saved to https://phabricator.wikimedia.org/P19206 and previous config saved to /var/cache/conftool/dbconfig/20220125-164324-ladsgroup.json
Change 757032 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/puppet@production] es1028: Disable notifications
Change 757032 merged by Ladsgroup:
[operations/puppet@production] es1028: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-01-25T16:49:00Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19208 and previous config saved to /var/cache/conftool/dbconfig/20220125-164900-ladsgroup.json
Cookbook cookbooks.sre.hosts.reimage was started by ladsgroup@cumin1001 for host es1028.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ladsgroup@cumin1001 for host es1028.eqiad.wmnet with OS bullseye completed:
- es1028 (WARN)
- Downtimed on Icinga
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201251744_ladsgroup_24763_es1028.out
- Checked BIOS boot parameters are back to normal
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2022-01-25T18:24:36Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19215 and previous config saved to /var/cache/conftool/dbconfig/20220125-182435-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-01-25T19:09:50Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19220 and previous config saved to /var/cache/conftool/dbconfig/20220125-190949-ladsgroup.json
Mentioned in SAL (#wikimedia-operations) [2022-01-25T19:12:39Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Make es1028 master of es3 T299911', diff saved to https://phabricator.wikimedia.org/P19221 and previous config saved to /var/cache/conftool/dbconfig/20220125-191238-ladsgroup.json