Site/Location: ulsfo
Number of systems: 1
Service: prometheus4002
Networking Requirements: internal
Processor Requirements: 2
Memory: 8Gb
Disks: 128Gb
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| prometheus: Add the prometheus Bullseye node definitions | operations/puppet | production | +1 -1 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | andrea.denisse | T324725 Observability Bullseye upgrades | |||
| Resolved | andrea.denisse | T309979 Upgrade Prometheus VMs in PoPs to Bullseye | |||
| Resolved | andrea.denisse | T333719 Site: ulsfo 1 VM request for prometheus4002 |
Event Timeline
Change 904841 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):
[operations/puppet@production] prometheus: Add the prometheus Bullseye node definitions
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host prometheus4002.ulsfo.wmnet with OS bullseye
Change 904841 merged by Andrea Denisse:
[operations/puppet@production] prometheus: Add the prometheus Bullseye node definitions
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host prometheus4002.ulsfo.wmnet with OS bullseye executed with errors:
- prometheus4002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host prometheus4002.ulsfo.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host prometheus4002.ulsfo.wmnet with OS bullseye completed:
- prometheus4002 (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303312240_denisse_1088056_prometheus4002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed