I have been working on T372498: Figure out how to provision a Kubernetes cluster using Magnum and OpenTofu for a week and a half or so. The project is using tofu to trigger OpenStack Magnum building a Kubernetes cluster. As reported in T372498#10065541 this automation was working at least on 2024-08-14.
On 2024-08-22 I decided to do a full test again by destroying the existing 2 node cluster and attempting to create a new one. Unfortunately every attempt to build a new cluster has failed. The tofu apply workflow is creating the cluster template and then the cluster resource. The Magnum automation then starts creating the instances that will make up the cluster. I can see with Horizon that the "master" node for the cluster is created, but the Magnum automation never recognizes the instance creation completing and eventually it fails the cluster build step.
Using tofu destroy to clean up after the failed cluster build is also requiring multiple attempts. Again via Horizon I can see the "master" instance change state (this time being deleted instead of created), but Magnum seems not to get the notification that the delete succeeded and instead eventually times out waiting for the delete to happen. A subsequent tofu destroy succeeds apparently because the system recognizes that the instance is gone before completing the cluster and template object removal.
The GitLab CI pipeline at https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/jobs/350333 shows a typical misbehavior matching the above description. When running under GitLab the tofu apply just keeps going until the job's time limit kills it. When run locally things seem to time out sooner with opentofu itself giving up. This is likely just some behavioral difference triggered by the TF_IN_AUTOMATION=1 envvar in the GitLab run.
I don't have any hard proof that this change in behavior was caused by T369044: Upgrade cloud-vps openstack to version 'Caracal', but the timing is suspicious.