Page MenuHomePhabricator

[toolforge] Investigate authentication
Open, HighPublic

Description

We will want to be able to open the API for users external to the toolforge infrastructure.

This task is to investigate current practices and options for us to do authentication in toolforge.

Currently we are doing ssl authentication using the client certificates that were generated for the tools (living in the NFS shared folders).

What we want

This authentication should grant access to anything you can do through the API at least, with access to k8s as an optional plus.

Restrictions

  • We want to avoid having to process on toolforge infra any LDAP credentials directly or Openstack app credentials, only short-lived tokens generated for the authentication. Once we have validated the user against some identity provider, we can generate our own session for the rest of the interaction.
  • Should be as simple as possible for users

Authenticating on toolforge API

This just needs a way to map User(being that a human user or a tool user) <-> Tool (as in tool group).

We don't need any roles/scopes or similar fine grained access control (RBAC/etc.).

Interactive authentication (ex. browser authentication)

For the future UI, we want to be able to use a browser to authenticate in an easy way (ex. avoid certificates)

Non-interactive authentication: (ex. cli authentication, gitlab/github workflows...)

We also want to enable other clients, like command line interface, authenticating in an easy way, this might require creating some sort of "api tokens" or similar to avoid having to authenticate on every cli execution (and enable automations like github/gitlab actions and such).

Toolforge tokens

This means that we need to store some tokens somewhere (be that our own miniservice, or idp/keystone/anything that has the capabilities)

Easy rotation

These "api tokens" will need some way for the users to easily rotate themselves in case of expiry, leakage or forgetting them.

Authenticating against radosgw for bucket access

We also want users to be able to access the buckets that

Authentication sources

Some options to investigate are:

idp.wmcloud.org

This uses CAS(the software) as the sso server.

Supports:

  • oauth
  • cas (the protocol)
  • openid
  • saml
  • rest

The data that we get back from the server include the ldap groups in the memberOf key:

memberOf 	[cn=tools.sqlchecker,ou=servicegroups,dc=wikimedia,dc=org, cn=tools.wm-lol,ou=servicegroups,dc=wikimedia,dc=org, cn=tools.jobs,ou=servicegroups,dc=wikimedia,dc=org, cn=project-account-creation-assistance,ou=groups,dc=wikimedia,dc=org, cn=project-cloudvirt-canary,ou=groups,dc=wikimedia,dc=org, cn=project-dumps,ou=groups,dc=wikimedia,dc=org, cn=project-wmflabsdotorg,ou=groups,dc=wikimedia,dc=org, cn=tools.toolschecker,ou=servicegroups,dc=wikimedia,dc=org, cn=tools.cloud-ceph-performance-tests,ou=servicegroups, ...

It has no support for token-based auth

other projects using it

wikitech

This has no advantage over using idp/keystone

keystone

Has it's own authentication protocol (see docs https://docs.openstack.org/keystone/pike/user/index.html)

It can be federated (use idp underneath) - https://docs.openstack.org/keystone/latest/admin/federation/introduction.html

keycloak

Similar to CAS, can be federated with both CAS, keystone and LDAP and be deployed locally.

The advantage might be to be able to add toolforge-specific bits in it.

Implementations

For any of the solutions we would want the users to not have to deal with more than one system (ex. not have to go to keystone themselves, get a token, then go to toolforge), and instead make toolforge deal with anything needed unless it's implementing standard auth flows for web UI (ex. OIDC, etc.) where you get redirected to the id provider login page.

All keystone

A starting diagram is here with all the flows (including the current ssl certificate), still unfinished:
https://drive.google.com/file/d/1f1F2XT9ppFaeYcuNVBd8s9DjYwXrrGix/view?usp=sharing

Blockers

We can't use keystone for interactive logins directly as that would require providing either you user-pass or application credential to the toolforge server to forward to keystone.

idp for web UI, new service for app tokens

(work in progress)

Interactive client identification flows

Besides using one of the app tokens, there's the possibility to use oauth2 to authenticate the cli against idp, see:

https://medium.com/@balaajanthan/openid-flow-from-a-cli-ac45de876ead

Essentially you can use a localhost based callback url, or an application private uri (for desktop apps), the localhost url might be the easiest to support.

See specification for 'native apps' here https://datatracker.ietf.org/doc/html/rfc8252#section-7.1

It seems that using localhost for the auth flow should work for linux, mac and windows on default installations (nice).

To open urls on different oses automatically (if we want to), we can try using xdg-open on linux, open on mac, and start on windows.

Related Objects

StatusSubtypeAssignedTask
ResolvedLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
In Progressdcaro
Resolveddcaro
In Progresskomla
Resolveddcaro
Resolveddcaro
Resolveddcaro
Opendcaro
Resolveddcaro
Opendcaro
Opendcaro
Resolveddcaro
ResolvedSlst2020
OpenNone
Resolved aborrero
Resolveddcaro
Resolveddcaro
Resolveddcaro
Resolveddcaro

Event Timeline

I'm trying to understand pros & cons of the different protocols, especially OIDC (OpenID Connect) vs CAS.

I played a bit with OIDC in a previous job, and I remember it as fairly complex but also quite well supported by a number of libraries that can be used to implement a client. I'm not sure how that compares with CAS.

A quick Google search led me to this presentation (although not very recent) with some useful comparisons: https://ldapcon.org/2017/wp-content/uploads/2017/08/16_Cl%C3%A9ment-Oudot_PRE_LDAPCon2017_SSO-1.pdf

I'm trying to understand pros & cons of the different protocols, especially OIDC (OpenID Connect) vs CAS.

I played a bit with OIDC in a previous job, and I remember it as fairly complex but also quite well supported by a number of libraries that can be used to implement a client. I'm not sure how that compares with CAS.

A quick Google search led me to this presentation (although not very recent) with some useful comparisons: https://ldapcon.org/2017/wp-content/uploads/2017/08/16_Cl%C3%A9ment-Oudot_PRE_LDAPCon2017_SSO-1.pdf

Nice!

Just checked the openid python libraries mentioned in the openid page and all have been archived :/

We might be a bit restricted here on what can we actually use underneath though, as in we have our users in LDAP, no matter what, and we have one single-sign-on implementation in prod that we can use (idp.w.o).

My current ideas (still exploring) are:

  • directly LDAP: we want to avoid this, it's what keystone, toolsadmin and wikitech currently do, ldap is kinda flaky and load sensitive
  • keystone (as LDAP proxy): the main downside I currently see without having played much, is that you need many things to authenticate, around 9 settings between domain, user, region, ..., not many libraries (that I have found), and no sso. Big advantage, if we have storage/trove integration, the auth is going to be proobably very similar, so users would be used to it xd
  • idp.w.o (as LDAP proxy): we get sso, the protocol is CAS, kinda easy, probably don't need any extra library, only user-pass, no two-factor auth yet though)
  • keycloak with ldap as federated backend: we have an extra layer, but no sso, we can add local users, have different deployments, etc., requires having the extra service
  • keycloak, with idp.w.o as federated backend: this allows us to add an extra layer on top of idp, we still have sso, but we can for example add local users and such (easier local deployment for example), but requires having an extra service

We also might want to change keystone to authenticate using idp.w.o at some point, to allow sso, if that's the case, we will have to move to idp auth eventually, so we might want to do so already.

I have not tried to setup a local keystone (will try with https://quay.io/repository/openstack.kolla/keystone?tab=tags&tag=latest from today's check-in), or idp instance, if it's easy, we might not gain anything by adding keycloak (as we would be able to run keystone in local deployments for example).

In any case, it seems quite possible that the storage access (s3 buckets) will have to be done through keystone with app credentials, so the users will have to authenticate that way to access non-public buckets. I was thinking on exposing that as part of the 'storage-api' service thingie, like:

> toolforge storage s3 list

+------------------------------------------------+
| name    |   credential | url                   |
+---------+--------------+-----------------------+
| bucket1 | somelonghash | https://url-to-bucket |
...

Depending on how it's implemented, the credentials might not be per-bucket though, but for the whole tool there (app credential associated to the tool user, if I understand correctly... @Andrew please correct me if I'm wrong xd).

In any case, it seems that unless we use some front (ex. keycloak) and adding extra info there, what we end up having is the LDAP data, that is, when you log in (as your user), we get the list of groups you belong to (the tools).

So there's no per-tool authentication token, making T363808: [builds-api, builds-cli] Prefix all endpoints with `/tool/<toolname>` needed (or similar, like passing always a tool parameter, though I'd prefer the path for namespacing) right?

I think using idp.w.o is my favourite solution, as it can potentially be a true "single" sign-on that all applications can rely on. Using CAS without any extra library is probably a good first step, and we can evaluate migrating to OIDC iff we find it provides any advantage.

Just checked the openid python libraries mentioned in the openid page and all have been archived :/

I found a discussion about that here: https://www.reddit.com/r/Python/comments/16pin4l/a_maintained_library_for_oidc_in_python/

Looks like https://github.com/IdentityPython/idpy-oidc is maintained and certified, it includes both a server implementation (which we don't need) and a client.

So there's no per-tool authentication token, making T363808: [builds-api] Prefix all endpoints with /tool/<toolname> needed (or similar, like passing always a tool parameter, though I'd prefer the path for namespacing) right?

I agree, after thinking about it I'm in favour of adding the /tool prefix. We can discuss the implementation in T363808.

dcaro changed the task status from Open to In Progress.Jun 4 2024, 10:49 AM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 10) board.

If you would like to experiment with tool-specific keystone auth, I've set up an example account. The creds for the tool user are in cloudcontrol1006:/home/dcaro/.config/openstack/clouds.yaml.

project: toolsbeta.test (id: a40d6252508b4fdaa057279aa306d151)
domain: toolsbeta (id: 31a151b68a61450a9a75a187a3ab4eb9)
service user: toolsbeta.test (id: toolsbeta.test)
group: toolsbeta.test (id: 51595)

You can also add/remove human users to/from that tool with striker and they will also be added to/removed from the toolsbeta.test project automatically due to the group assignment.

This was set up using 'keystoneify' at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1040243 -- we can adapt that code to run within the tools domain as a daemon if we want projects like that for every tool.

If you would like to experiment with tool-specific keystone auth, I've set up an example account. The creds for the tool user are in cloudcontrol1006:/home/dcaro/.config/openstack/clouds.yaml.

project: toolsbeta.test (id: a40d6252508b4fdaa057279aa306d151)
domain: toolsbeta (id: 31a151b68a61450a9a75a187a3ab4eb9)
service user: toolsbeta.test (id: toolsbeta.test)
group: toolsbeta.test (id: 51595)

You can also add/remove human users to/from that tool with striker and they will also be added to/removed from the toolsbeta.test project automatically due to the group assignment.

This was set up using 'keystoneify' at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1040243 -- we can adapt that code to run within the tools domain as a daemon if we want projects like that for every tool.

<3 thanks!

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/387

api-gateway: bump to 0.0.29-20240704101259-2cfd83b3

We will use custom deploy tokens for deployments, and now horizon has moved to using idp, so that reduces the options.

I'll move this back to the queue for now, but will retake probably q3/q4 2024-25, before we start working on the UI.

dcaro removed dcaro as the assignee of this task.Nov 5 2024, 10:32 AM
dcaro edited projects, added Toolforge; removed Toolforge (Toolforge iteration 16).
dcaro changed the task status from In Progress to Open.Nov 5 2024, 10:39 AM
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.

Quick update, I've been looking into enabling CAS or OIDC (openid connect) into nginx, and it seems it's not supported unless you have the paid "plus" version :/, there's a small lua plugin that seems a bit unmaintained to do some of the CAS process.

On the other hand, apache has a certified module for oidc (https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-oidc/), so we might want to move from nginx to apache, and take advantage of that, might need some investigation. Another alternative is implementing the auth on the api-gateway fastapi side, though might be faster/simpler on the apache/nginx, we can use the fastapi as a fallback if it gets complicated.

Some notes on CAS, and client authentication.

By default CAS does not enable proxy tokens (https://apereo.github.io/cas/7.2.x/services/Configuring-Service-Proxy-Policy.html), so the only action you can do with the token is validate it and get the gorups/info of the user as far as I understand it (to be tested).

I think this is not a security issue, specially because all that info is already public, so in order to enable clis to authenticate users, I think it would be ok to enable a service http://127.0.0.1.* (callback url) similar to how oauth 2 enables native apps, but well, given that all we need and want is just info about the user groups it's way simpler and safer (as the app can't act on the name of the user anywhere else).

This is a draft MR trying to create a POC for it https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/69

There's the question also about how to expire/renew these tokens too, but I'll investigate that later.

For the cli flow, it would be something like:

  • Listen on 127.0.0.1:<randomport> for the callback
  • Ask the user to log in in the browser giving the url of the idp with said callback in the command line
  • Retrieve the token from the callback, then call the toolforge api callback with that token
  • Retrieve the session cookie from that call and store it for the session

We might want at some point to be able to have a bit longer-lived tokens for the cli and other automated applications, maybe something similar to the deployment-tokens for the components-api (or even something that will include those).

This looks great! I also checked the POC code and it looks good.

Do you plan on testing the OIDC protocol as well, in addition to the CAS protocol? OIDC protocol is supported by Apereo CAS, but it might require tweaking some settings in the Apereo config. I did some more research today after seeing your POC, and it looks like OIDC might give us some advantages, in particular it could entirely replace the need for generating session tokens in the Toolforge API.

Note: I don't think that testing OIDC should necessarily be a blocker, we could start with CAS if it's easier, and we should be able to migrate to OIDC at a later moment in a way that is transparent to users.

OIDC flow (RFC 8252) as I understand it:

  • CLI listens on 127.0.0.1
  • CLI points user to log in with the browser, the local server on 127.0.0.1 receives an Authorization code
  • CLI sends that Authorization code to IDP (either directly or proxied by our API gateway) and gets an Access Token
  • Once the CLI has an Access Token, it can attach it to each Toolforge API request
  • The Toolforge API can validate and decode that token on each request, extracting the user/group information
  • This means the Toolforge API would become fully OAuth-based, no need for API session tokens/cookies at all
  • Access Tokens are short lived and can be refreshed automatically by the CLI

I've been looking into enabling CAS or OIDC (openid connect) into nginx, and it seems it's not supported unless you have the paid "plus" version :/

That sucks. :( It looks like Envoy supports it though, maybe we could consider replacing Nginx with Envoy? But it also looks easy enough to implement the token validation in Python in the Toolforge API gateway.

I found this Python library that might be helpful to implement both the token validation (in the API gateway) and the Access Token generation and refresh (in the CLI client): https://docs.authlib.org/. The docs are not as clear as I would like, but it looks quite powerful and in active development.

We might want at some point to be able to have a bit longer-lived tokens for the cli and other automated applications

I agree this will be necessary for example for toolforge cron jobs that need to call the toolforge API. We should provide a way for users to generate long-lived tokens. But I would only use them for non-interactive jobs and applications, not for the CLI. As a data point, the AWS CLI recently added an OAuth-based login flow.