Page MenuHomePhabricator

Site: 1 VM request for doc2002
Closed, ResolvedPublic

Description

Site/Location: codfw
Number of systems: 1
Service: doc2002
Networking Requirements: private IP
Processor Requirements: 2
Memory: 2Gb
Disks: 120Gb

Details

Event Timeline

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:

  • doc2002 (FAIL)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:

  • doc2002 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:

  • doc2002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

cookbooks.sre.hosts.decommission executed by denisse@cumin1001 for hosts: doc2002

  • doc2002 (WARN)
    • Host not found on Icinga, unable to downtime it
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:

  • doc2002 (FAIL)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:

  • doc2002 (FAIL)
    • The reimage failed, see the cookbook logs for the details

Change 902489 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):

[operations/puppet@production] doc: Add the doc2002 node definition

https://gerrit.wikimedia.org/r/902489

Mentioned in SAL (#wikimedia-operations) [2023-03-23T21:24:33Z] <denisse@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"

Change 902489 merged by Andrea Denisse:

[operations/puppet@production] doc: Add the doc2002 node definition

https://gerrit.wikimedia.org/r/902489

Mentioned in SAL (#wikimedia-operations) [2023-03-23T21:25:39Z] <denisse@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"

Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye completed:

  • doc2002 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303232131_denisse_3186993_doc2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed

Mentioned in SAL (#wikimedia-operations) [2023-03-24T23:57:29Z] <denisse@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"

Mentioned in SAL (#wikimedia-operations) [2023-03-24T23:58:45Z] <denisse@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819"