Page MenuHomePhabricator

makevm cookbook fails get_vm() call
Closed, InvalidPublic

Description

I created a Ganeti VM with the sre.ganeti.makevm cookbook and the cookbook completed successfully, but the get_vm() call was failing. Maybe this needs some kind of timeout, I guess the sync didn't complete in time here?

Ready to create Ganeti VM ldap-replica1001.wikimedia.org in the ganeti01.svc.eqiad.wmnet cluster on row B with 1 vCPUs, 4GB of RAM, 20GB of disk in the public network.
Is this correct?
Type "done" to proceed
> done
The command output will be printed at the end.
Creating VM ldap-replica1001.wikimedia.org in cluster ganeti01.svc.eqiad.wmnet with row=B vcpus=1 memory=4GB disk=20GB link=public. This may take a few minutes.
Fri Oct  2 09:38:22 2020  - INFO: No-installation mode selected, disabling startup
Fri Oct  2 09:38:35 2020  - INFO: Selected nodes for instance ldap-replica1001.wikimedia.org via iallocator hail: ganeti1014.eqiad.wmnet, ganeti1017.eqiad.wmnet
Fri Oct  2 09:38:36 2020 * creating instance disks...
Fri Oct  2 09:38:40 2020 adding instance ldap-replica1001.wikimedia.org to cluster config
Fri Oct  2 09:38:40 2020 adding disks to cluster config
Fri Oct  2 09:38:40 2020  - INFO: Waiting for instance ldap-replica1001.wikimedia.org to sync disks
Fri Oct  2 09:38:41 2020  - INFO: - device disk/0:  0.10% done, 41m 35s remaining (estimated)
Fri Oct  2 09:39:41 2020  - INFO: - device disk/0: 10.90% done, 8m 14s remaining (estimated)
Fri Oct  2 09:40:41 2020  - INFO: - device disk/0: 21.60% done, 7m 1s remaining (estimated)
Fri Oct  2 09:41:41 2020  - INFO: - device disk/0: 32.40% done, 6m 8s remaining (estimated)
Fri Oct  2 09:42:41 2020  - INFO: - device disk/0: 43.20% done, 5m 13s remaining (estimated)
Fri Oct  2 09:43:42 2020  - INFO: - device disk/0: 53.90% done, 4m 5s remaining (estimated)
Fri Oct  2 09:44:42 2020  - INFO: - device disk/0: 64.70% done, 3m 11s remaining (estimated)
Fri Oct  2 09:45:42 2020  - INFO: - device disk/0: 75.40% done, 2m 15s remaining (estimated)
Fri Oct  2 09:46:42 2020  - INFO: - device disk/0: 86.20% done, 1m 16s remaining (estimated)
Fri Oct  2 09:47:42 2020  - INFO: - device disk/0: 97.00% done, 16s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 99.90% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 99.90% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 99.90% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Oct  2 09:47:59 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Oct  2 09:48:00 2020  - INFO: Instance ldap-replica1001.wikimedia.org's disks are in sync
Fri Oct  2 09:48:00 2020  - INFO: Waiting for instance ldap-replica1001.wikimedia.org to sync disks
Fri Oct  2 09:48:00 2020  - INFO: Instance ldap-replica1001.wikimedia.org's disks are in sync
MAC address for ldap-replica1001.wikimedia.org is: aa:00:00:16:63:55
Syncing VMs in DC eqiad to Netbox
Failed to call 'cookbooks.sre.ganeti.makevm.get_vm' [1/20, retrying in 3.00s]:
Created interface ##PRIMARY## on VM ldap-replica1001
Attached IPv4 208.80.154.139/26 and IPv6 2620:0:861:2:208:80:154:139/64 to VM ldap-replica1001 and marked as primary IPs
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)

Event Timeline

So, I've created three VMs and this happened in one out of three cases only.

herron triaged this task as Medium priority.Oct 2 2020, 4:39 PM
herron added a project: SRE-tools.
herron added a subscriber: Volans.

Nothing to do here, that is not a failure, the cookbook is just polling to get the craeted VM.
The [1/20, retrying in 3.00s] is the first call that failed, the second one succeeded and the cookbook continued its job as expected.