I was attempting to create a brand new VM for a brand new service using the makevm cookbook:
$ sudo cookbook sre.ganeti.makevm --vcpus 1 --memory 2 --disk 20 --cluster eqiad --group A -t T356710 --os bookworm -p 7 ncmonitor1001The imaging portion is failing to start the first boot:
Set boot media to network for VM ncmonitor1001.eqiad.wmnet in cluster eqiad Forced PXE for next reboot Shutting down VM ncmonitor1001.eqiad.wmnet in cluster eqiad ----- OUTPUT of 'gnt-instance shu...1001.eqiad.wmnet' ----- Waiting for job 2397616 for ncmonitor1001.eqiad.wmnet ... Mon Feb 12 23:15:53 2024 - WARNING: Ignoring offline instance check ================ PASS |██████████████████████████████████████████████████████| 100% (1/1) [00:10<00:00, 10.10s/hosts] FAIL | | 0% (0/1) [00:10<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'gnt-instance shu...1001.eqiad.wmnet'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Starting VM ncmonitor1001.eqiad.wmnet in cluster eqiad ----- OUTPUT of 'gnt-instance sta...1001.eqiad.wmnet' ----- Waiting for job 2397617 for ncmonitor1001.eqiad.wmnet ... ================ PASS |██████████████████████████████████████████████████████| 100% (1/1) [00:05<00:00, 5.02s/hosts] FAIL | | 0% (0/1) [00:05<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'gnt-instance sta...1001.eqiad.wmnet'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Host rebooted via gnt-instance [1/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: unable to get uptime Caused by: Cumin execution failed (exit_code=2) [2/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: unable to get uptime Caused by: Cumin execution failed (exit_code=2) [3/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: unable to get uptime Caused by: Cumin execution failed (exit_code=2) [4/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: unable to get uptime Caused by: Cumin execution failed (exit_code=2) [5/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: unable to get uptime Caused by: Cumin execution failed (exit_code=2) Found reboot since 2024-02-12 23:15:48.576663 for hosts ncmonitor1001.eqiad.wmnet Host up (Debian installer) Add puppet_version metadata to Debian installer ----- OUTPUT of 'gnt-instance mod...1001.eqiad.wmnet' ----- Modified instance ncmonitor1001.eqiad.wmnet - hv/boot_order -> disk Please don't forget that most parameters take effect only at the next (re)start of the instance initiated by ganeti; restarting from within the instance will not be enough. Note that changing hypervisor parameters without performing a restart might lead to a crash while performing a live migration. This will be addressed in future Ganeti versions. ================ PASS |██████████████████████████████████████████████████████| 100% (1/1) [00:03<00:00, 3.12s/hosts] FAIL | | 0% (0/1) [00:03<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'gnt-instance mod...1001.eqiad.wmnet'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Set boot media to disk for VM ncmonitor1001.eqiad.wmnet in cluster eqiad Set boot media to disk [1/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for ncmonitor1001.eqiad.wmnet not found yet, keep polling for it: uptime 66.0 > threshold 4.49 [...]
I re-attempted with:
$ sudo cookbook sre.hosts.reimage -t T356710 --os bookworm ncmonitor1001 --newBut sadly the same issue occurs.