Site/Location: drmrs
Number of systems: 1
Service: prometheus6002
Networking Requirements: internal
Processor Requirements: 2
Memory: 8Gb
Disks: 128Gb
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
prometheus: Add the prometheus Bullseye node definitions | operations/puppet | production | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | andrea.denisse | T324725 Observability Bookworm upgrades | |||
Resolved | andrea.denisse | T309979 Upgrade Prometheus VMs in PoPs to Bullseye | |||
Resolved | andrea.denisse | T333721 Site: drmrs 1 VM request for prometheus6002 |
Event Timeline
Change 904841 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):
[operations/puppet@production] prometheus: Add the prometheus Bullseye node definitions
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host prometheus6002.drmrs.wmnet with OS bullseye
Change 904841 merged by Andrea Denisse:
[operations/puppet@production] prometheus: Add the prometheus Bullseye node definitions
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host prometheus6002.drmrs.wmnet with OS bullseye executed with errors:
- prometheus6002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host prometheus6002.drmrs.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host prometheus6002.drmrs.wmnet with OS bullseye completed:
- prometheus6002 (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303312243_denisse_1088283_prometheus6002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed