Page MenuHomePhabricator

Cloud VPS: extend tofu-infra coverage
Open, MediumPublic

Description

The tofu integration was developed in T365696: Investigate how to run OpenTofu to manage Cloud VPS admin-only resources. As of this writing, the Cloud VPS tofu-infra project mostly covers flavors.

This task is to consider and track work to extend the coverage to pretty much everything admin-defined:

  • projects (i.e, tenant definitions)
  • quotas
  • glance images
  • neutron network settings
  • DNS zones and related configuration

Otherwise they are just manually coded in the database.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
routers: consolidate router_interfaces into the same modulerepos/cloud/cloud-vps/tofu-infra!27aborreroarturo-220-routers-consolidatemain
data/: add routers for eqiad1repos/cloud/cloud-vps/tofu-infra!26aborreroarturo-220-data-add-routers-fomain
tofu-infra: refactor importsrepos/cloud/cloud-vps/tofu-infra!25aborreroarturo-244-tofu-infra-refactormain
Draft: tofu-infra: put all networking data in the same yaml filerepos/cloud/cloud-vps/tofu-infra!24aborreroarturo-314-tofu-infra-put-allmain
tofu-infra: define neutron routersrepos/cloud/cloud-vps/tofu-infra!23aborreroarturo-252-tofu-infra-define-nmain
tofu-infra: refactor providers into its own filerepos/cloud/cloud-vps/tofu-infra!22aborreroarturo-172-tofu-infra-refactormain
data/: introduce eqiad1-r network and subnet informationrepos/cloud/cloud-vps/tofu-infra!21aborreroarturo-175-data-introduce-eqiamain
tofu-infra: import codfw1dev subnetsrepos/cloud/cloud-vps/tofu-infra!17aborreroarturo-269-tofu-infra-import-cmain
networks: refactor to remove _set indirection and cloudvps keywordrepos/cloud/cloud-vps/tofu-infra!16aborreroarturo-158-networks-refactor-tmain
tofu-infra: introduce Cloud VPS networks for codfw1devrepos/cloud/cloud-vps/tofu-infra!13aborreroarturo-689-tofu-infra-introducmain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
InvalidNone
Resolved aborrero
DeclinedNone
Resolvedfnegri
DuplicateNone
OpenNone
Resolved aborrero
DeclinedNone
Declined aborrero
Resolved aborrero
OpenNone
Resolved aborrero
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolvedfnegri
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolved aborrero
Resolvedfnegri
Resolved aborrero
InvalidNone
OpenNone
StalledAndrew
OpenNone
OpenAndrew
OpenNone
OpenNone
OpenNone
Resolvedfnegri
OpenNone

Event Timeline

aborrero changed the task status from Open to In Progress.Jul 17 2024, 1:31 PM
aborrero claimed this task.
aborrero triaged this task as Medium priority.
aborrero moved this task from Next to Doing on the User-aborrero board.
fnegri renamed this task from Cloud VPS: consider extending tofu-infra coverage to Cloud VPS: extend tofu-infra coverage.Jul 18 2024, 1:23 PM

Change #1056117 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openstack: opentofu: init modules before runnig plan

https://gerrit.wikimedia.org/r/1056117

Change #1056117 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: opentofu: init modules before runnig plan

https://gerrit.wikimedia.org/r/1056117

aborrero changed the task status from In Progress to Open.Sep 27 2024, 5:56 AM

I'm thinking this over again, and I'm fairly compelled by the fact that resources managed by tofu are in source control with a history.

That said, I know that @fgiunchedi found the current workflow very confusing. Filippo, can you talk about your experience a bit here? One thing I wonder about is how both deployments (codfw1dev and eqiad1) are currently coupled, so if tofu can't apply in codfw1dev then we also can't change things in eqiad1; I'm pretty sure that needs to be changed.

deployments (codfw1dev and eqiad1) are currently coupled, so if tofu can't apply in codfw1dev then we also can't change things in eqiad1; I'm pretty sure that needs to be changed.

+1 for splitting codfw1dev and eqiad1, similarly to what we did in the tofu-provisioning repo (that was created at a later time than the tofu-infra one).

I'm thinking this over again, and I'm fairly compelled by the fact that resources managed by tofu are in source control with a history.

That said, I know that @fgiunchedi found the current workflow very confusing. Filippo, can you talk about your experience a bit here? One thing I wonder about is how both deployments (codfw1dev and eqiad1) are currently coupled, so if tofu can't apply in codfw1dev then we also can't change things in eqiad1; I'm pretty sure that needs to be changed.

Certainly, I stubbed my toe on the interaction between tofu and cookbooks when working on the project NFS server. Specifically, the wmcs.nfs cookbooks can change DNS service records for the NFS server, e.g. when changing networks. This works when DNS records are not managed by tofu of course. When DNS service records are managed by tofu (e.g. tools, toolsbeta) then the cookbook and tofu step on each other toes, making the cookbook work on some projects but not others. This was my main gripe with the current situation at the time. We ended up un-managing NFS service records for tools tofu as a middle ground solution.

I realize the above may be a corner case, though to me it begs the question: in a world where all/most resources are managed by tofu, what is the story with cookbook interaction or other general programmatic changes to resources we want to do?

We do have an example today with creating new instances: the cookbook AIUI sends a merge request for tofu and then operators merge it. I thought about doing the same for the NFS cookbook, though programmatically editing HCL files is not trivial without hacks (e.g. anchors in the file), creating new instances works fine I think because it is an append operation, not editing files in place.

From the task description the scope here is "admin-defined" resources, I take it this is different than tools tofu-provisioning where a project itself is managed by tofu? Or is "projects managed by tofu" in scope here too?

"admin-defined" resources

I think that admin-defined just means 'things in cloud-vps managed and supported by staff rather than by random users'.

deployments (codfw1dev and eqiad1) are currently coupled, so if tofu can't apply in codfw1dev then we also can't change things in eqiad1; I'm pretty sure that needs to be changed.

+1 for splitting codfw1dev and eqiad1, similarly to what we did in the tofu-provisioning repo (that was created at a later time than the tofu-infra one).

I created T411090: [tofu-infra] [wmcs-cookbooks] Allow running "tofu apply" on a single cluster.