Page MenuHomePhabricator

Investigate how to run OpenTofu to manage Cloud VPS admin-only resources
Closed, ResolvedPublic

Description

We want to use OpenTofu to manage some of the admin-only resources we need to define on OpenStack. Right now the use case is defining our flavors since we need to replace those, but this could as well expand to other uses (e.g. networks, DNS) in the future. To me the big question is where to run it:

Options

Run on cloudcontrol/cloudcumin nodes

The simplest option in my mind is to install OpenTofu on either the per-deployment cloudcontrol nodes or the new cloudcumin nodes. This would involve mirroring the Tofu packages to our apt repository, installing those packages, and then using Puppet provisioning credentials either for the current novaadmin user or a dedicated service account with full admin access.

Run on GitLab CI

The other option at least worth exploring in my mind is to have GitLab CI apply any Tofu changes automatically instead of manually running it somewhere. This would require some special care about how to handle the admin credentials (and should use a dedicated service account and definitely not novaadmin), but could unlock interesting use cases like showing a diff (tofu plan) before merging a MR.

Event Timeline

I think we could do both: being able to manually run Tofu from a shared server is nice, and I would want to have it for debugging/emergency purposes even if we had CI integration. I think we could start from implementing this part, I vote for using cloudcontrols but cloudcumins are also fine.

If the state is stored in Object Storage, we can then also explore running some or all the Tofu workflows from CI. Initially, I would probably rather only run tofu plan in the CI, but once we trust it to be reliable we could also automatically tofu apply after merge. A potential issue: if we store the state in Object Storage, I don't think we get state locking. Maybe we could explore using the Postgres backend.

Another additional idea: we could run tofu plan every X hours (could be a systemd timer in cloudcontrols or somewhere else), and alert if the plan is not clean. I imagine a workflow where I create a patch/MR, the CI shows me the plan, somebody merges the patch, and we forget to manually run tofu apply (or if the CI fails) after the patch is merged, we get an alert after X hours.

Change #1039677 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] aptrepo: Add mirror for OpenTofu packages

https://gerrit.wikimedia.org/r/1039677

Change #1039677 merged by Majavah:

[operations/puppet@production] aptrepo: Add mirror for OpenTofu packages

https://gerrit.wikimedia.org/r/1039677

Mentioned in SAL (#wikimedia-operations) [2024-06-07T08:49:22Z] <taavi> import opentofu 1.7.2 to apt.wikimedia.org T365696

Change #1040099 had a related patch set uploaded (by Majavah; author: Majavah):

[labs/private@master] Add fake opentofu admin passwords and tokens

https://gerrit.wikimedia.org/r/1040099

Change #1040099 merged by Majavah:

[labs/private@master] Add fake opentofu admin passwords and tokens

https://gerrit.wikimedia.org/r/1040099

Change #1040115 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:openstack: New profile for running OpenTofu for Cloud VPS

https://gerrit.wikimedia.org/r/1040115

Change #1040115 merged by Majavah:

[operations/puppet@production] P:openstack: New profile for running OpenTofu for Cloud VPS

https://gerrit.wikimedia.org/r/1040115

Change #1041045 merged by Majavah:

[operations/puppet@production] P:openstack: opentofu: Add a variable for region

https://gerrit.wikimedia.org/r/1041045

Change #1041622 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] O:openstack: Install OpenTofu in eqiad1

https://gerrit.wikimedia.org/r/1041622

Change #1041622 merged by Majavah:

[operations/puppet@production] O:openstack: Install OpenTofu in eqiad1

https://gerrit.wikimedia.org/r/1041622