Page MenuHomePhabricator

Decision Request - How openstack projects relate to tofu-infra
Open, Stalled, MediumPublic

Description

Problem

As part of T370037: Cloud VPS: extend tofu-infra coverage, we are using opentofu to track virtual resource definitions for Cloud VPS with an Infrastructure-as-Code approach.

The primary implementation is the tofu-infra repository: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/tofu-infra

There are, however, certain virtual resources that make more sense for tracking via tofu-infra, for a number of reasons:

  • we need to define them via other methods (like, cookbook automation, via horizon, etc)
  • we need the definition to be self-service by users

In the particular case of this decision request, we are interested in deciding over project resources (or, tenants).

This has been discussed multiple times in the past, for example in internal team meetings (with notes shared in public).

NOTE: the initial implementation of tofu-infra made by @aborrero considered the following:

  • tracking project resource definitions in tofu-infra was a requirement for tracking other things like security groups, or DNS records
    • this is not true, given we can associate resources within a project to that project via using a data reference, thus not needing the project itself being defined in the repo
  • tracking project resources would give us a strong benefit from the IaC/gitops point of view
    • but maybe not? Specially if we ever get interested in some scenarios, for example if project creation is self-service

Constraints and risks

  • We may shot ourselves in the foot, for example:
    • if we "track" too many resources on tofu-infra, that we don't really need
    • if we miss the opportunity to track some resources
  • Refactors are becoming heavy. The tofu-infra project already saw a few refactors, each being heavier than the previous one (as the resource count grows)

Options

Option 1

Track all project definitions (for projects in the "default" domain) via tofu-infra.

Cookbooks can create a patch in the repo if required.

This is what is implemented as of this writing.

Pros:

  • We already have the projects defined in tofu-infra

Cons:

  • There are certain conflicts - or role overlap - between automation via cookbooks and automation via opentofu
  • Tracking hundred of resources (the project resources themselves) may make the repository a bit more complex to maintain
  • May get in our way if we ever move to a self-service scenario in our future (users creating their own projects) -- this is not in the roadmap today anyway

Option 2

Only track admin projects via tofu-infra.

This is, projects that are under full control from the WMCS team with the purpose of offering and maintaining the Cloud VPS service itself.

Examples of projects to track: admin, admin-monitorin, metrics-infra, project-proxy, cloudinfra, bastion, etc.
Examples of project resources to don't track in the repo: tools, toolsbeta

See also: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Infrastructure_projects

This will require removing from the tofu-infra state the projects that we no longer want to track.

Pros:

  • This may make the tofu-infra repository simpler, as we don't need to track hundred of resources (the project resources themselves)
  • This relaxes any conflict between automation via cookbook and automation via opentofu, given they would have different roles
  • If we ever implement a self-service approach for Cloud VPS projects, then this is the right option for that scenario -- this is not in the roadmap today anyway

Cons:

  • We don't track project creation on a git repository, for non-admin projects anyway.

Option 3

Similar to option 1, but split the project definitions into its own repo, let's call it "tofu-projects". All the resources inside the "admin" projects ("admin", "cloudinfra", "bastion", etc.) would remain in the "tofu-infra" repo.

Pros:

  • cleaner separation between "admin" projects and resources (managed by cloud vps admins), and projects and resources for cloud vps tenants

Cons:

  • potentially harder to share resources between the two (like network policies that might apply both to admin projects and generic projects). But I think we could still do it using a module that can be imported from one repo to the other, if needed.

Event Timeline

aborrero renamed this task from Decision Request - How to openstack projects relate to tofu-infra to Decision Request - How openstack projects relate to tofu-infra.Feb 4 2025, 5:08 PM
aborrero triaged this task as Medium priority.

I like option 1 in theory, but in practice I think option 2 is the best choice at the moment.

Implementing option 1 well requires some work, we can reconsider that in the future.

I would like to see the 'TBD' parts filled up before expressing an opinion. Some ideas to fill them up:

I think it would be useful to try to define what are the user flows and audiences that we foresee will be served with this, pseudo-random example:

Actors:

  • wmcs sres (high infra knowledge)
  • clinic duty assignee (little infra knowledge)
  • cloudvps user (little infra knowledge)

Flows:

  • wmcs sres creating a new infra project (ex. metricsinfra or admin-monitoring)
  • current clinic duty creating a new project for a user (ex. tools)
  • current clinic duty changing a project member list
  • cloudvps user self-creating a project

...

From there it will be easier to define the interfaces and the shape they should have to serve the audiences they are built for.

There are certain conflicts - or role overlap - between automation via cookbooks and automation via opentofu

I don't see this as a conflict, as long as the flows are defined it's equivalent imo (ex. if you are doing clinic duty and have to create a project -> cookbook, if you are wmcs sre and have to create an admin project -> tofu directly).

Tracking hundred of resources (the project resources themselves) may make the repository a bit more complex to maintain

This brings the question if tofu is actually making things simpler/easier for us or not overall. If it's better than not using it only for a very small amount of resources it might not be a good idea long-term. From my point of view it should be better than only cookbooks for managing many resources, and more than able to scale up, but please elaborate with your experience and the issues you have found so far.

In option 1, I do see the benefit of having the revisions tracking the project creation, even if the patches are done automatically from a cookbook (or even merged automatically!).

Another benefit of option 1 is consolidating the project creation flow, having projects created only by tofu interacting with openstack (as opposed to sometimes interacting directly, sometimes with tofu) allows for better checks, standardization, debugging, etc.

I would like to see the 'TBD' parts filled up before expressing an opinion. Some ideas to fill them up:

I think it would be useful to try to define what are the user flows and audiences that we foresee will be served with this, pseudo-random example:
[...]

Continuing with this example, I see:

  • actors: WMCS SREs
  • flows: WMCS SREs creating a new infra project (ex. metricsinfra or admin-monitoring)

The other examples:

  • current clinic duty changing a project member list --> We already decided not to track project membership via tofu-infra. Given project member list is self-service, it does not make sense to track in version control. Users can change this at any time.
  • cloudvps user self-creating a project --> Not supported today. It would be nice to have though, but something based on opentofu is maybe not the right implementation for this, as it would be self-service. Similar to the previous point.
  • current clinic duty creating a new project for a user (ex. tools) --> I'm leaning towards not having this in tofu-infra, i.e, option 2.

I propose we move forward with option 2.

It seems this is waiting for my vote, so though I still consider the pros and cons of both options incomplete, I lean for option 1.

I find having a single system for project creation worth it even if it means creating automatic patches. It also allows to attach metadata to the project creation (ex. the task, rationale, etc.) both in the commit and in a comment/readme file, etc. that otherwise is hard to keep track of (project description? openstack logs?)
That in my mind is one of the greatest advantages of using git to track infrastructure.

I think tofu should be able to scale to the amount of projects we have without many issues, if it does not, then maybe it's not the right tool to manage our setup. Having 500 directories/files/entries is not overwhelming imo (they are quite clearly named, and self-contained, thanks to the refactor).

And looking forward to managing quotas and other user-project resources, it makes even more sense to me to use the same methods to deploy them (tofu being the one that creates the resources), and the same entry point (cookbook being the one that uses tofu). I don't see tofu being a blocker for self-service in any way, just a building block/tool for it.

Said that, if the issue is effort to implement, then sure, if we don't have the capacity let's not do it, though that is not in the rationales for any of the options, so I assume it's not an issue.

dcaro changed the task status from Open to Stalled.Sep 18 2025, 3:56 PM
dcaro removed dcaro as the assignee of this task.

Leaving it open and stalled, until we have capacity to act on whatever is decided.

Having reviewed and reconsidered this, I now vote for a modified version of option 1:

"Track all project definitions /in the default domain/ via tofu-infra."

I think that gets tofu out of the way of projects that are auto-generated by e.g. magnum which was my primary concern with managing all tenants.

Having reviewed and reconsidered this, I now vote for a modified version of option 1:

"Track all project definitions /in the default domain/ via tofu-infra."

I think that gets tofu out of the way of projects that are auto-generated by e.g. magnum which was my primary concern with managing all tenants.

Agree with you, I was not taking into account this, they are probably better left on their own.

I modified the description to clarify that Option 1 would only apply to projects in the default domain, as I think that was always the intent although it was not clearly specified.

I also added Option 3 with another path that I would like to explore. Worth noting that choosing Option 1 would not preclude us from considering Option 3 later on.

I vote for the Option 1 (with Andrew's note on only for non-automatic projects), though Option 3 would be a close second (I don't completely understand the flows, though look interesting).

I'm fine with going with Option 1 if there is consensus, we can re-evaluate option 3 in the future.

Seems like consensus around option 1 -- let's close this next week if no one objects.