Page MenuHomePhabricator

New instance has been stuck in "scheduling" for more than an hour
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Try to create a new instance in horizon, in this case I selected
  • image: bullseye
  • flavor: g3.cores4.ram8.disk20

What happens?:
The instance status stays stuck in "scheduling"

What should have happened instead?:
The instance should get created and then boot

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:
Project: Wikilink
Instance ID: 60897ce3-f33a-40c6-ba48-55384d4ca745
This is the second instance I've tried creating. I was able to delete the first one after it was stuck in scheduling for over an hour as well

Event Timeline

Hello @jsn.sherman -- I'm interested in this issue but won't have time to investigate for a few hours. If your quota permits it, please leave that stuck host as is and go ahead and try to re-schedule. Either it'll work and get you unstuck or it'll fail and I'll have more testing data :)

@Andrew when you say reschedule, do you mean launch a new instance? We don't have enough capacity to add another one of the same flavor, but I'd be happy to try to launch a smaller instance for troubleshooting.

Andrew claimed this task.

This issue is surfaced very poorly, but I've found this in the logs:

nova-conductor.log:2021-12-15 15:22:45.990 19270 ERROR oslo_messaging.rpc.server [req-3a977bf4-449a-48ac-a6cc-d62d107da895 jsn wikilink - default default] Exception during message handling: nova.exception.InstanceExists: Instance prod01 already exists.
nova-conductor.log:2021-12-15 15:22:45.990 19270 ERROR oslo_messaging.rpc.server nova.exception.InstanceExists: Instance prod01 already exists.
nova-conductor.log:2021-12-15 16:21:27.089 19270 ERROR oslo_messaging.rpc.server [req-44cfae6d-6988-4662-afa0-db30de115ff1 jsn wikilink - default default] Exception during message handling: nova.exception.InstanceExists: Instance prod01 already exists.
nova-conductor.log:2021-12-15 16:21:27.089 19270 ERROR oslo_messaging.rpc.server nova.exception.InstanceExists: Instance prod01 already exists.

And indeed there's already a VM named prod01 'twl'. So I suggest you delete and recreate with a different name and see if that gets you anywhere. Meanwhile I'll have a look at why this is happening at all.

Verified that prod-wikilink spawns just fine, which is enough to get us un-stalled. Thanks!