Page MenuHomePhabricator

[toolforge,storage] Provide per-tool access to cloud-vps object storage
Open, HighPublic

Description

Cloud-vps object storage via swift and S3 would be vastly more useful if individual tools could have secure access to it. Because toolforge is a single keystone tenant, right now the default access policy is that /everything/ in toolforge will be accessed via the same credentials which is not very useful.

There are several approaches we could take.

  • On-demand bespoke keystone project creation
    • Users request a cloud-vps project with limited quotas that only permit object storage (no VMs, etc since that's not what they need.)
      • Pro: requires no development, we can start doing this today. Supports UI access via horizon, all existing keystone access patterns
      • Con: not a great approach if we have more than a dozen or so tools using this. Can't be used to integrate with build service or other toolforge services since it's not always present.
      • Con: tedious user experience
      • Con: tedious maintenance for users (different UI, different tooling)
      • Con: no self-service, requires us to create the project and keep it in sync with user permissions
  • Automatic creation of per-tool keystone project (projects in database)
    • An agent creates a specially-prefixed keystone project for each existing tool account (in a database-backed domain), creates storage-specific app credentials, injects them into the toolforge account (as with replica.cnf)
      • Pro: Keystone projects could be re-used for trove or other non-compute openstack resources.
      • Con: Database-stored keystone projects should work (they're the standard use case) but they'll introduce a bunch of new pathways in our deployment.
      • Con: If we want to provide Horizon access we need passwords as well as app credentials, will need to figure out a 'forgot/create password' Horizon workflow that doesn't interfere with our existing password model for 'normal' cloud-vps projects.
      • Con: tedious user experience
      • Con: tedious maintenance for users (different UI, different tooling)
      • Mystery: Are there any scaling concerns with having thousands of projects rather than dozens?
      • Note: we can do this only on-demand, instead of pre-creating them, that alleviates considerably the amount of projects (most tools won't need it)
  • On-demand creation of per-tool keystone project (projects in database), exposed as an toolforge API (have a service that allows creating projects + s3 storage in cloudVPS, then expose it as a Toolforge API)
    • Same as before, but users only interact with it through a toolforge API, and projects are created only on demand
      • Pro: Keystone projects could be re-used for trove or other non-compute openstack resources.
      • Pro: unified user experience, users don't have to change tools/UIs/environments to managed toolforge resources
      • Pro: easy maintenance for users (self-sevice), they create the resources when they need without admin interaction
      • Con: Database-stored keystone projects should work (they're the standard use case) but they'll introduce a bunch of new pathways in our deployment.
      • Mystery: Are there any scaling concerns with having thousands of projects rather than dozens?
      • Note: we can do this only on-demand, instead of pre-creating them, that alleviates considerably the amount of projects (most tools won't need it)
  • Automatic creation of per-tool keystone project (projects in ldap)
    • Create a new ldap-backed keystone domain (via keystone config) that consumes our existing toolforge account records in ldap as tenant descriptions. An agent creates and injects application credentials for swift/s3 usage.
      • Pro: We wouldn't need an agent to worry about account creation or deletion, that happens for free as ldap records are created/deleted. We'd still probably need an agent for credential management.
      • Pro: Keystone projects could be re-used for trove or other non-compute openstack resources.
      • Con: If we want to provide Horizon access we need passwords as well as app credentials, will need to figure out a 'forgot/create password' Horizon workflow that doesn't interfere with our existing password model for 'normal' cloud-vps projects.
      • Con: tedious user experience
      • Con: tedious maintenance for users (different UI, different tooling)
      • Mystery: Are there any scaling concerns with having thousands of projects rather than dozens? I'm less worried about this with ldap but it could still crop up as a problem in places.
  • Per-tool bucket creation and access
    • An agent or privileged script creates a bucket/container for a given tool, creates *waves hands* bucket-specific credentials and provides them to the toolforge admin.
      • Pro: diy means we'd have total control over the user experience.
      • Pro: Smaller maintenance footprint as Keystone is unaware of this entirely, it's just us and the radosgw.
      • Pro: unified user experience, users don't have to change tools/UIs/environments to managed toolforge resources
      • Pro: easy maintenance for users (self-sevice), they create the resources when they need without admin interaction
      • Con: access via CLI only, no Horizon ever - this might not be a con
      • Con: doesn't help with Trove at all
      • Mystery: Is it possible/practical to make per-container credentials? Does radosgw support this at the same time as the existing keystone integration? (note to self, this is something to do with IAM policies)
  • toolforge-specific radosgw server with one storage account per tool
    • An agent or privileged script creates a rados account for each tool, injects credentials into tool account.
      • Pro: Each tool is a first-class member of object storage service, with access to the full feature set and arbitrarily many buckets or containers.
      • Pro: diy means we'd have total control over the user experience.
      • Pro: Smaller maintenance footprint as Keystone is unaware of this entirely, it's just us and the radosgw.
      • Pro: unified user experience, users don't have to change tools/UIs/environments to managed toolforge resources
      • Pro: easy maintenance for users (self-service), they create the resources when they need without admin interaction
      • Con: access via CLI only and API, no Horizon involvement
      • Con: doesn't help with Trove or other openstack services

Event Timeline

[15:54]  <    taavi> andrewbogott: did you consider the option of having a second radosgw instance where authentication is not tied to openstack?
[16:00]  <andrewbogott> I didn't but I also don't think that would be very hard.
[16:01]  <   arturo> and a dedicated ceph pool
[16:01]  <   arturo> sounds interesting
[16:01]  <   arturo> could they share the same ingress port?
[16:01]  <andrewbogott> yeah, I think a separate radosgw instance would imply a different pool (as far as I know)
[16:01]  <    taavi> arturo: we can certainly do host-based http routing with haproxy, that's not a problem
[16:02]  <   arturo> what would be the new fqdn ?
[16:02]  <andrewbogott> wouldn't we want it to be a different endpoint anyway?
[16:02]  <andrewbogott> Oh, I see what you mean
[16:02]  <    taavi> yeah, we would want it either on some subpath of object.eqiad1.wikimediacloud.org or we could invent a new subdomain
[16:02]  <    taavi> anyhow, that seems a relatively minor detail to me
[16:03]  <   arturo> or that service domain you were thinking for toolforge
dcaro renamed this task from Provide per-tool access to cloud-vps object storage to [toolforge,storage] Provide per-tool access to cloud-vps object storage.Feb 26 2024, 4:21 PM
dcaro triaged this task as High priority.
dcaro updated the task description. (Show Details)
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.
Andrew updated the task description. (Show Details)

I'm striking out the 'keystone projects in ldap' option because keystone doesn't really support that one.

My favorite option is 'Automatic creation of per-tool keystone project'. Since that's a simple extension of 'On-demand creation of per-tool keystone project' I'm going to start with that (with a cli tool rather than an API endpoint for now).

Here's what we need for that:

New domains 'tools' and 'toolsbeta' with user mappings for the tool account (tools.<toolname>) and tool member groups.

cli script, runs on cloudcontrol:

confirms tool exists

If project does not exist:

creates equivalent keystone project

sets default quotas (e.g. no VMs)

adds tool group to project with member role

adds service user ('tool.<toolname>') to project with member role

Creates or refreshes app credentials to provide radosgw access to tool.<toolname>

Calls os_app_cred_server to inject creds into tool $home

--or--

if tool exists

confirms no trove databases

deletes all rados buckets

deletes project

calls os_app_cred_server to delete creds from tool $home

rest server, runs on tools-nfs server:

Similar to toolsdb_replica_cnf, injects or removes app credentials for tool.<toolname> service user

I'm working on T363983: [toolforge] Investigate authentication, and I think both are very related, specially when we talk about how to authenticate to access the buckets/storage.

Some random questions xd:
Is the authentication only per-tool?
If I'm a user, do I need to go gather the tool credentials to be able to access it's buckets?
How do users manage the buckets? Through horizon?
If so, do they login into horizon with their user or the tool?
Will they be able to login as the tool at all?
Is this credential for everything related to openstack (ex. trove/puppetenc/web proxies/...) or only for the storage?

Will there be any possibility to authenticating in any other way? Does this lock us in on using keystone for any authentication? (essentially, taking the decision of T363983: [toolforge] Investigate authentication already).

Eventually, we will want to move to keystone using idp.w.o for authentication right?
If so, does this setup allow for it? Does it block us from using idp.w.o? (sso and such)

A bit of a non-related question, should we create app credentials for every toolforge user too?

Calls os_app_cred_server to inject creds into tool $home
rest server, runs on tools-nfs server:

Small note on where to put the generated credentials, we should also avoid putting anything on NFS anymore, we should be using only envvars (same as toolsdb_replica_cnf changed to), so no need to run on the NFS server.

I'm also thinking that we might want to move all that into it's own "component" inside toolforge from the beginning, instead of having a script/api in a cloudcontrol + random http service somewhere, and give toolforge itself the credentials to manage the keystone projects, otherwise it's going to become really hard to do later (hacks stay for very long).

As in, having something like 'storage-api' that wraps it. Maybe starting with a script as you mention, but that's run from within toolforge instead of cloudcontrol.

Moving the parts that toolforge relies on within toolforge is the direction that the maintain-dbusers service will be going to eventually (becoming a db service inside toolforge instead of a rest api running on the NFS servers + cron running somewhere else, unless completely replaced by trove before that).

That makes me think about how to authenticate toolforge to be able to create the prefixes and such, if there's no way to easily grant only those permissions, then we might want to put that api in-between toolforge and openstack to only allow creating the project skeleton.

I'm currently leaning on using idp.w.o for toolforge api auth, and leaving the openstack auth only when required for the storage or database management. Unless we are 100% sure that we are never going to move to idp.w.o for horizon (let me know if so).

I have to play a bit more with both, to understand the user models and such, and see how some of the auth flows would work.

So many questions!

I'm working on T363983: [toolforge] Investigate authentication, and I think both are very related, specially when we talk about how to authenticate to access the buckets/storage.

Some random questions xd:
Is the authentication only per-tool?

Keeping in mind that I don't yet know for sure that this is all possible... I was imagining we'd have a per-tool service user and also provide access to all tool members to the project. The per-tool service user would not have a password, only app credentials. Then, automated workflows would use the service user app creds (thus ensuring workflow survival if users join or leave the tool) but users could still auth as themselves via Horizon to see the UI.

(I'm not 100% sure that it's possible to create app credentials for account Y when running as novaadmin, which is what we would need for the passwordless service accounts. More research needed.)

My rational for most of this is that we don't ever want people using actual stored-in-ldap passwords for automatic workflows, only app creds and tokens. This is based on a basic long-standing "don't put your passwords on nfs" policy which once we move off of nfs could be revisited.

If I'm a user, do I need to go gather the tool credentials to be able to access it's buckets?

They'd be injected like replica.cnf. When replica.cnf moves out of nfs we can move app creds as well.

How do users manage the buckets? Through horizon?

Horizon or (more likely) cli or API.

If so, do they login into horizon with their user or the tool?

User

Will they be able to login as the tool at all?

Nope.

Is this credential for everything related to openstack (ex. trove/puppetenc/web proxies/...) or only for the storage?

Everything

Will there be any possibility to authenticating in any other way? Does this lock us in on using keystone for any authentication? (essentially, taking the decision of T363983: [toolforge] Investigate authentication already).

If we rely on app credentials then thins would need to happen via Keystone only, as app credentials only exist as a horizon concept.

Eventually, we will want to move to keystone using idp.w.o for authentication right?

I don't know -- I hadn't really thought about adding an idp backend to keystone although it might be possible.

If so, does this setup allow for it? Does it block us from using idp.w.o? (sso and such)

It doesn't block it for users (since in both cases the id/password are consumed from ldap). The per-tool creds would be keystone-only though.

A bit of a non-related question, should we create app credentials for every toolforge user too?

I don't think that's needed; I would encourage any non-horizon workflows to happen via tool service account.

Calls os_app_cred_server to inject creds into tool $home
rest server, runs on tools-nfs server:

Small note on where to put the generated credentials, we should also avoid putting anything on NFS anymore, we should be using only envvars (same as toolsdb_replica_cnf changed to), so no need to run on the NFS server.

I'm also thinking that we might want to move all that into it's own "component" inside toolforge from the beginning, instead of having a script/api in a cloudcontrol + random http service somewhere, and give toolforge itself the credentials to manage the keystone projects, otherwise it's going to become really hard to do later (hacks stay for very long).

OK -- if it doesn't run on a cloudcontrol then we'll need to figure out WHO has the power to create accounts and universal credentials and decide if we trust those creds to be stored in toolforge. If the script runs on cloudcontrol then it can use the existing creds which are already trusted there.

As in, having something like 'storage-api' that wraps it. Maybe starting with a script as you mention, but that's run from within toolforge instead of cloudcontrol.

Moving the parts that toolforge relies on within toolforge is the direction that the maintain-dbusers service will be going to eventually (becoming a db service inside toolforge instead of a rest api running on the NFS servers + cron running somewhere else, unless completely replaced by trove before that).

That makes me think about how to authenticate toolforge to be able to create the prefixes and such, if there's no way to easily grant only those permissions, then we might want to put that api in-between toolforge and openstack to only allow creating the project skeleton.

A bit of a non-related question, should we create app credentials for every toolforge user too?

I don't think that's needed; I would encourage any non-horizon workflows to happen via tool service account.

Can users also access object storage with their user credentials/tokens (not app credentials)? If a user can see/edit object storage items in Horizon, I think they can we use the same mechanism for CLI user access to object storage?

If that is too hard to implement, we could start with no user access (neither Horizon, nor CLI), and only "tool service user" access:

  • our new agent creates a "tool service user" (user/pwd for that user will not be available to users)
  • our new agent creates app credentials tied to the "tool service user" (app credentials will be visible to users of that tool via a .cnf file or env vars).

Side note: we should ideally rotate the app credentials quite frequently, so that if a user is removed from a tool, they won't be able to reuse the tool's app credentials for too long.

I'm hitting a roadblock with the service user plan -- because of keystone's belt-and-suspenders approach to security, I can override the policy to allow an admin user to create app creds for another user (e.g. novaadmin creating creds for tool.mytool) but there's an explicit check in the code comparing context ID to cred ID and erroring out. IMO this is a keystone bug (https://launchpad.net/bugs/2065212) but it's unlikely to be changed upstream anytime soon.

The per-tool service user would not have a password, only app credentials.

If I understand it correctly, this might be entirely impossible in the current Keystone model (it could be a bug or intended behaviour). Is it a problem though to have a password associated with the service tool user, if that password is stored securely and only available to the agent that uses it to create the associated app credentials?

The per-tool service user would not have a password, only app credentials.

If I understand it correctly, this might be entirely impossible in the current Keystone model (it could be a bug or intended behaviour). Is it a problem though to have a password associated with the service tool user, if that password is stored securely and only available to the agent that uses it to create the associated app credentials?

It might be possible to add a password; right now I'm engaged in a bit of a hack, telling keystone "treat this groupOfNames object as though it's a user." It will probably take some ldap schema changes in order to convince ldap that a groupOfNames can have a secure password associated with it.

I'm going to give that keystone bug and patch a few days and see if anyone agrees or disagrees with me. If it looks like it's on track for merge we can just hack our install to get a head start.

Thanks for all the replies!

If I'm a user, do I need to go gather the tool credentials to be able to access it's buckets?

They'd be injected like replica.cnf. When replica.cnf moves out of nfs we can move app creds as well.

replica.cnf exists already out of NFS :) (as in, it has an old/backwards compatible nfs file + new future-compatible envvars)


I'm still thinking on use the cases

This task is specifically tackling this one:

  • As a tool, I want to be able to access the s3 buckets I created (from horizon) from within toolforge

Some other ones we are considering (though outside the scope of the task) seem to be:

  1. s3/storage related:
    1. From toolforge (cli/api) I want to be able to create/delete s3 buckets on demand for my tool
      • This is currently "open a task and get to horizon"
      • This could be "run toolforge storage s3 create/delete/list ..."
      • Note that keystone does not provide sso, so the authentication done for the toolforge API/UI will not work as is to access the buckets files (users will need to re-authenticate at least, or use a specific app credential)
    2. As a user, I want to be able to access the public s3 buckets I created, from anywhere (ex. using the url)
      • UPDATED: this works well :), using the 'link' provided on the left side panel of the horizon container management UI (not the 'download' button).
    3. As a user, I want to be able to access the private s3 buckets I created, from anywhere (ex. using the url + user/pass/token/app_cred?)
      • UPDATED: this works well, when you provide the authentication on the request itself (will not do the login dance for you), and you have to use the direct url to swift, instead of passing through horizon (see the closed subtask).
  2. Toolforge API (and future UI):
    1. As a user, I want to be able to access the toolforge APIs for the tools I'm member of
  3. Trove/DBaaS related:
    1. As a user, I want to be able to create/delete databases for my tools
      • Currently through horizon UI (creating a task, etc.), logging in through horizon/keystone
      • This could be toolforge db create/delete/list ...
    2. As a tool, I want to be able to access the database I created
      • Currently done through horizon UI, by getting the user/pass from the trove management and manually putting it somewhere
      • This could be automatically done by populating envvars on DB creation (ex. <DBNAME>_USER <DBNAME>_PASS)

The only times that "users" need to authenticate directly to horizon would be when accessing the s3 private buckets, as that's the only time they need to directly pass through horizon, all the other times they can hit a toolforge API that has then the credentials needed to authenticate against keystone/openstack.

Is that correct? Are the any others I missed?

May I ask why are we are talking about tools interacting with Horizon? I would hope everything happens either via the APIs directly or via Striker instead of introducing a second admin interface for Toolforge :-)

May I ask why are we are talking about tools interacting with Horizon? I would hope everything happens either via the APIs directly or via Striker instead of introducing a second admin interface for Toolforge :-)

Well, resources that are hosted by openstack are authenticated through keystone, and as of now, the only way to manage them is through horizon, so unless we change the way we authenticate for openstack resources, we will have to deal with horizon.

I would strongly prefer if we did not have to at any point though, and as you say deal only with the toolforge API, or with toolforge UI (aka. striker) to manage any resource related to toolforge tools (ex. buckets, trove dbs, ...)

<snip>

I'm still thinking on use the cases

This task is specifically tackling this one:

  • As a tool, I want to be able to access the s3 buckets I created (from horizon) from within toolforge

Your description of the status quo seems correct to me. Your problem statement is missing a bit though, I think. I would say:

  1. As a user or tool, I want to be able to create/delete s3 buckets that are scoped to a tool. I want to be able to do this via a cli tool or api endpoint.
    • This currently not supported at all, unless we create a bespoke project by hand that is (by convention only) intended for use by a single tool
    • Scope can be implemented via a keystone project, or via some home-made radosgw ACL to be determined.
    • Some of my options above support a two step process (create the scope, and then create the buckets) and some a one-step process (because the scope is implicit or automatically created on tool creation.) If we pursue a one-step process (implicit scope creation) then S3 or swift access can happen directly via app credential and existing radosgw apis.

Some other ones we are considering (though outside the scope of the task) seem to be:

  1. Toolforge API (and future UI):
    1. As a user, I want to be able to access the toolforge APIs for the tools I'm member of

Yep, seems right.

  1. Trove/DBaaS related:
    1. As a user, I want to be able to create/delete databases for my tools
      • Currently through horizon UI (creating a task, etc.), logging in through horizon/keystone
      • This could be toolforge db create/delete/list ...
    2. As a tool, I want to be able to access the database I created
      • Currently done through horizon UI, by getting the user/pass from the trove management and manually putting it somewhere
      • This could be automatically done by populating envvars on DB creation (ex. <DBNAME>_USER <DBNAME>_PASS)

This all seems correct, although I reiterate that the interesting part is the scope creation or management. We don't currently have a good way to create databases that are explicitly tied to a particular tool -- even with the bespoke/by hand approach we take now there's nothing to keep the tool and trove ACLS in sync.

The only times that "users" need to authenticate directly to horizon would be when accessing the s3 private buckets, as that's the only time they need to directly pass through horizon, all the other times they can hit a toolforge API that has then the credentials needed to authenticate against keystone/openstack.

Is that correct? Are the any others I missed?

I would only consider a solution 'correct' if it allows users to ignore Horizon entirely -- that's where all the 'inject credentials' parts come in. Some of my proposed solutions would get us Horizon support as a bonus, some would not.

This all seems correct, although I reiterate that the interesting part is the scope creation or management. We don't currently have a good way to create databases that are explicitly tied to a particular tool -- even with the bespoke/by hand approach we take now there's nothing to keep the tool and trove ACLS in sync.

Can you elaborate on this?

Here, we are only talking about allowing tools to set up and use s3-style buckets, correct? Is there any intersection/dependency between this use case and enabling ceph-backed PVCs in toolforge k8s, e.g. the toolforge k8s nodes being able to access and authenticate with the ceph cluster?

Here, we are only talking about allowing tools to set up and use s3-style buckets, correct? Is there any intersection/dependency between this use case and enabling ceph-backed PVCs in toolforge k8s, e.g. the toolforge k8s nodes being able to access and authenticate with the ceph cluster?

If we use the toolforge API intermediate, then we can implement there any type of storage we want while making it more or less transparent to the users (ex. toolforge storage volume create for volumes mounted in the containers).

If that is done by directly communicating to ceph, or through openstack cinder volumes is for us to decide.

This intersects with the current setup, in the sense that we are defining how tools authenticate against openstack (keystone) and if we implement the volumes using cinder, then we have that authentication part that would be the same (or very similar).

We could create a couple pools in ceph with authentication using ipd (https://docs.ceph.com/en/latest/radosgw/oidc/), and connect to ceph directly for both s3, and rbd volumes for example and skip the middle openstack layer. That would force us to manage quotas also outside openstack.

It would not solve "at the same time" access to trove, though that would allow users to start their own databases as deployments in k8s using PVCs.

This all seems correct, although I reiterate that the interesting part is the scope creation or management. We don't currently have a good way to create databases that are explicitly tied to a particular tool -- even with the bespoke/by hand approach we take now there's nothing to keep the tool and trove ACLS in sync.

Can you elaborate on this?

The current database-for-a-tool solution is that we've created trove-only openstack projects to manage databases used by toolforge tools. Those projects may or may not have the same members as the tool that project is supporting; it's entirely ad-hoc. If we have a model where a tool corresponds directly to an openstack tenant then we can put the trove DBs in there and have consistent access and membership between tools and trove.

Here, we are only talking about allowing tools to set up and use s3-style buckets, correct?

That's correct.

Is there any intersection/dependency between this use case and enabling ceph-backed PVCs in toolforge k8s, e.g. the toolforge k8s nodes being able to access and authenticate with the ceph cluster?

Probably not. This task is about integrating with radosgw which is an object store. PVCs are block volumes so they'd need to be created in ceph via different channels. There's some change that the same tenancy/auth model would be useful but I have not investigated that at all.

This all seems correct, although I reiterate that the interesting part is the scope creation or management. We don't currently have a good way to create databases that are explicitly tied to a particular tool -- even with the bespoke/by hand approach we take now there's nothing to keep the tool and trove ACLS in sync.

Can you elaborate on this?

The current database-for-a-tool solution is that we've created trove-only openstack projects to manage databases used by toolforge tools. Those projects may or may not have the same members as the tool that project is supporting; it's entirely ad-hoc. If we have a model where a tool corresponds directly to an openstack tenant then we can put the trove DBs in there and have consistent access and membership between tools and trove.

Interesting, I thought that the openstack projects created for trove were already mapped to a tool (and the auth was through ldap matching the user to that tool that then matches the openstack tenant).
Just to verify, as of today, the trove database for a tool, is in an arbitrary openstack project that is managed by whomever requested to create the database right? (so completely independent of the LDAP tool group)

I think that that should be for sure mapped for sure (in whichever way).

This all seems correct, although I reiterate that the interesting part is the scope creation or management. We don't currently have a good way to create databases that are explicitly tied to a particular tool -- even with the bespoke/by hand approach we take now there's nothing to keep the tool and trove ACLS in sync.

Can you elaborate on this?

The current database-for-a-tool solution is that we've created trove-only openstack projects to manage databases used by toolforge tools. Those projects may or may not have the same members as the tool that project is supporting; it's entirely ad-hoc. If we have a model where a tool corresponds directly to an openstack tenant then we can put the trove DBs in there and have consistent access and membership between tools and trove.

Interesting, I thought that the openstack projects created for trove were already mapped to a tool (and the auth was through ldap matching the user to that tool that then matches the openstack tenant).
Just to verify, as of today, the trove database for a tool, is in an arbitrary openstack project that is managed by whomever requested to create the database right? (so completely independent of the LDAP tool group)

That's correct. Of course the owner of the db project can manage project access, so ideally they keep things in sync manually.

I think that that should be for sure mapped for sure (in whichever way).

Yep, would be better :)

I'm converging on a new design, which is a variant of 'Automatic creation of per-tool keystone project (projects in database)':

  • An agent (probably maintain-kubeusers) reconciles the tool list with keystone projects in the tools or toolsbeta domain. New projects are created or removed as needed. These are simplified projects without keystone hooks and with most quotas zero'd out. For example, toolsbeta.testtool88 will have a project named 'testtool88' in the 'toolsbeta' domain.
  • The corresponding ldap group (cn=toolsbeta.testtool88,ou=servicegroups,dc=wikimedia,dc=org) is mapped to a keystone user group, and that group is given the 'member' role in the associated project.

At ths point, all tool members have Horizon access within that tool project. They can create app credentials and manage object storage as needed. But, we should also probably automate credential creation so that object storage access doesn't require wrestling with Horizon:

  • Each tool domain (tools, toolsbeta) will have a service user with inherited project-wide domain membership: tools-admin and toolsbeta-admin. Maintain-kubeusers will have access to this user's credentials, and will use them to create project-specific app credentials (possibly restricted to object storage only) and inject them into the tool-specific secrets.

I think this gets us what we need. My only real concerns with this scheme are a) creating a service user with tools-wide access and b) the fact that the automatic app credentials will attribute all object actions to tools/toolsbeta-admin rather than to the specific tool, which may cause some auditing confusion. Of course the project containing the objects will still be associated with the correct tool so that probably doesn't matter.

Each tool domain (tools, toolsbeta) will have a service user with inherited project-wide domain membership: tools-admin and toolsbeta-admin. Maintain-kubeusers will have access to this user's credentials, and will use them to create project-specific app credentials (possibly restricted to object storage only) and inject them into the tool-specific secrets.

I'm not convinced on how to expose this, I think that maintain-kubeuser is already too overloaded to have to handle openstack interactions too. I think that would be better to have some other on-demand process instead.

How would users in this case create buckets? Through horizon?

If so we might want instead to have a new component that does it for them (just simple bucket management, and simple app credential management for the buckets).

Each tool domain (tools, toolsbeta) will have a service user with inherited project-wide domain membership: tools-admin and toolsbeta-admin. Maintain-kubeusers will have access to this user's credentials, and will use them to create project-specific app credentials (possibly restricted to object storage only) and inject them into the tool-specific secrets.

I'm not convinced on how to expose this, I think that maintain-kubeuser is already too overloaded to have to handle openstack interactions too. I think that would be better to have some other on-demand process instead.

That's fine, I don't think it much matters where it runs. If we're using keystone as a general auth mechanism for toolforge, or if we wind up using object storage as part of standard build-service workflows then of course we'll need to create the needed project for every tool on creation anyway, so it might as well run as a daemon rather than an on-demand API.

How would users in this case create buckets? Through horizon?

Horizon or via any swift or s3 api.

If so we might want instead to have a new component that does it for them (just simple bucket management, and simple app credential management for the buckets).

Yeah, that's likely just a matter of having a 10-line script that users can run as needed.

I've just learned from Taavi that app credentials don't actually provide adequate tenant separation when issued by a user with multiple project membership (T348857). So the global service user plan is probably a bad one. So instead, how about...

  • Each tool service users (uid=toolsbeta.test,ou=people,ou=servicegroups,dc=wikimedia,dc=org) is mapped to an existing keystone user via domain config
  • The same agent that creates per-tool keystone projects generates and assigns a password to that service user via direct ldap calls; then injects that password into tool secrets
  • Same agent also adds service user as a member of the per-tool keystone project
  • These service users will have a different password safelist policy from human accounts: they /are/ allowed to login via password but (probably) are not allowed to log into Horizon
  • ...now toolforge services or users can use that user to create app credentials, ec2 creds, buckets, whatever.

We'll also need to provide some kind of facility for rotating that password. Ideally it gets automatically rotated anytime a user is removed from the tool.

this is starting to sound complicated but I don't think it is actually too bad. I'll write some pseudocode shortly.

Change #1039799 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Keystone: map toolsbeta groups and users to keystone groups and users

https://gerrit.wikimedia.org/r/1039799

Change #1039799 merged by Andrew Bogott:

[operations/puppet@production] Keystone: map toolsbeta groups and users to keystone groups and users

https://gerrit.wikimedia.org/r/1039799

Change #1040243 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add 'keystoneify' admin script

https://gerrit.wikimedia.org/r/1040243

I've just learned from Taavi that app credentials don't actually provide adequate tenant separation when issued by a user with multiple project membership (T348857). So the global service user plan is probably a bad one. So instead, how about...

  • Each tool service users (uid=toolsbeta.test,ou=people,ou=servicegroups,dc=wikimedia,dc=org) is mapped to an existing keystone user via domain config

Just to have it clear myself, correct me if I'm wrong.

  • 'tool service user' is a user in LDAP with ou=servicegroups (this is, the currently existing tool ldap group).
  • 'existing keystone user' is just telling keystone to look up for those in LDAP, where they already exist.
  • The same agent that creates per-tool keystone projects generates and assigns a password to that service user via direct ldap calls; then injects that password into tool secrets

I think that we might not need to inject that password anywhere, as the users should not need to use it directly ever no?
They will be using app/ec2 credentials but not the password right?

That password would be needed only to manage buckets, rotate app credentials and such.

That would minimize the need to rotate those credentials (the password), as those are the ones that seem hardest to rotate.

It's a pity though that we need one account per-tool to be able to manage those, now we need to store 3k passwords xd

  • Same agent also adds service user as a member of the per-tool keystone project
  • These service users will have a different password safelist policy from human accounts: they /are/ allowed to login via password but (probably) are not allowed to log into Horizon
  • ...now toolforge services or users can use that user to create app credentials, ec2 creds, buckets, whatever.

Toolforge users should not need to interact with openstack directly, these should be delegated to toolforge services only.

We'll also need to provide some kind of facility for rotating that password. Ideally it gets automatically rotated anytime a user is removed from the tool.

this is starting to sound complicated but I don't think it is actually too bad. I'll write some pseudocode shortly.

From the toolforge side, the authentication has different aspects:

Validate that a user is part of a tool (access to toolforge)
  • Interactive (user/pass/totp), for a future web UI
  • Non-interactive (some sort of token) for toolforge cli/api clients -> this one should have expiry and be easy to rotate from within toolforge
Validate a tool (access to toolforge)
  • Non-interactive (some sort of token, ssl certificate, preferably the token) -> this one should have expiry and be easy to rotate too, this might happen only from within toolforge

(public API users use their own tokens).

Tools access to their private buckets (access to buckets)
  • Non-interactive (some sort of token)
Toolforge system access to manage buckets (toolforge access to openstack swift)
  • Non-interactive (can be a token, or even user-pass if needed as long as there's no one time password)

I say that to keep in mind that in order to add authentication for toolforge, all we need is for keystone to allow us to map user<->tool, it could be done by using the accounts you mention before, but we don't need to give them any permissions as all we need there is the relationship.

Once toolforge can verify that some user belongs to some tool, we will need a service account for the toolforge system to be able to rotate the app credentials though, but this can be a different account to which only toolforge roots have access to.

And the same or a different account can then be used to manage user's buckets.

The only thing that's missing is how do tools access the buckets (users can use the tool's credentials, so we don't care about them for this).

If I understand correctly, ec2 credentials would fit there, allowing that toolforge service account to rotate them, would that be ok?
I have to look a bit more closely to that bug xd

<snip>

Just to have it clear myself, correct me if I'm wrong.

  • 'tool service user' is a user in LDAP with ou=servicegroups (this is, the currently existing tool ldap group).
  • 'existing keystone user' is just telling keystone to look up for those in LDAP, where they already exist.

Yep. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039799 tells keystone to map those existing toolsbeta ldap users to keystone users in the 'toolsbeta' domain.

  • The same agent that creates per-tool keystone projects generates and assigns a password to that service user via direct ldap calls; then injects that password into tool secrets

I think that we might not need to inject that password anywhere, as the users should not need to use it directly ever no?
They will be using app/ec2 credentials but not the password right?

Yep, that would work. I was assuming we'd publish the password in userspace and then run any associated scripts creating creds &c also in userspace, but we could store the password in some infra database and only use it with infra services.

That password would be needed only to manage buckets, rotate app credentials and such.

That would minimize the need to rotate those credentials (the password), as those are the ones that seem hardest to rotate.

That's a good point, I doubt that rotating the password invalidates already-issued app credentials.

It's a pity though that we need one account per-tool to be able to manage those, now we need to store 3k passwords xd

It will be annoying to store them, but it also reduces the attack surface by a whole lot! I'm interested in trying to fix T348857 but @taavi thinks that even if that bug is fixed we still shouldn't trust app creds to be properly scoped to a tenant if they're created by a domain-wide user.

<snip>

I say that to keep in mind that in order to add authentication for toolforge, all we need is for keystone to allow us to map user<->tool, it could be done by using the accounts you mention before, but we don't need to give them any permissions as all we need there is the relationship.

If a user is interacting with toolforge directly from their laptop using APIs, are they doing that as themselves or as the toolforge service user?

(I think the answer to this is 'as themselves')

And, similar question:

If a user is interacting with a tool-owned S3 service or Trove database directly from their laptop using APIs, are they doing that as themselves or as the toolforge service user?

(I think the answer to this is: 'as the service user')

If my assumptions are correct, then we should probably create a new keystone role to limit the actions that a human account can make within an openstack tool project.

If I understand correctly, ec2 credentials would fit there, allowing that toolforge service account to rotate them, would that be ok?

Yep, that would be fine. Are you thinking we'll support swift and s3 both, or just s3?

<snip>
If a user is interacting with toolforge directly from their laptop using APIs, are they doing that as themselves or as the toolforge service user?

Ideally as the user, nothing would really prevent them from using a tool "token" or similar (that's why they need rotating often).
If they authenticate as the user, they don't need to rotate any creds when they join/leave a tool.

(I think the answer to this is 'as themselves')

And, similar question:

If a user is interacting with a tool-owned S3 service or Trove database directly from their laptop using APIs, are they doing that as themselves or as the toolforge service user?

Access to the resource -> uses the resource's specific auth (ex. ec2 credential to access a bucket, database user/pass when using trove, those only exist per-tool, and will need to be rotated when a user leaves a tool)

Managing the resources (ex. creating a bucket) -> the user's authentication to access the according toolforge API, that after verifying that user can act for that tool, will use the openstack credentials to manage those resources.

(I think the answer to this is: 'as the service user')

If my assumptions are correct, then we should probably create a new keystone role to limit the actions that a human account can make within an openstack tool project.

If I understand correctly, ec2 credentials would fit there, allowing that toolforge service account to rotate them, would that be ok?

Yep, that would be fine. Are you thinking we'll support swift and s3 both, or just s3?

I'd focus on s3 first, and have swift as a plus if it's easy, would you need different auth methods for each?