Site/Location: codfw
Number of systems: 1
Service: doc2002
Networking Requirements: private IP
Processor Requirements: 2
Memory: 2Gb
Disks: 120Gb
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| doc: Add the doc2002 node definition | operations/puppet | production | +4 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T291916 Tracking task for Bullseye migrations in production | |||
| Resolved | Dzahn | T327068 Bullseye upgrade for remaining Collab hosts | |||
| Resolved | • eoghan | T319477 Migrate doc hosts to Bullseye | |||
| Resolved | andrea.denisse | T332819 Site: 1 VM request for doc2002 |
Event Timeline
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
cookbooks.sre.hosts.decommission executed by denisse@cumin1001 for hosts: doc2002
- doc2002 (WARN)
- Host not found on Icinga, unable to downtime it
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Change 902489 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):
[operations/puppet@production] doc: Add the doc2002 node definition
Mentioned in SAL (#wikimedia-operations) [2023-03-23T21:24:33Z] <denisse@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
Change 902489 merged by Andrea Denisse:
[operations/puppet@production] doc: Add the doc2002 node definition
Mentioned in SAL (#wikimedia-operations) [2023-03-23T21:25:39Z] <denisse@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye completed:
- doc2002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303232131_denisse_3186993_doc2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Mentioned in SAL (#wikimedia-operations) [2023-03-24T23:57:29Z] <denisse@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"
Mentioned in SAL (#wikimedia-operations) [2023-03-24T23:58:45Z] <denisse@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"