Page MenuHomePhabricator

Configure LDAP authentication for the DataHub frontend
Closed, ResolvedPublic

Description

update - We have decided to use LDAP authentication without CAS-SSO for the MVP phase.
We will look into whether OIDC or CAS-SSO is the better solution for us in the later production phase.

For now we only need to restrict access to members or either the wmf or nda LDAP groups.
~~~~

We should aim to enable single-sign-on authentication as soon as possible for the DataHub web frontend.

The DataHub authentication mechanism can use JAAS: https://datahubproject.io/docs/how/auth/jaas/

Our Apero CAS system can also use JAAS: https://apereo.github.io/cas/6.1.x/installation/JAAS-Authentication.html

The question is, how to we get DataHub to use Apero CAS?

Event Timeline

BTullis triaged this task as High priority.

Now that I've thought more about it, I'm not 100% sure that SSO is actually the best thing to configure at the moment.
My reasoning is that in the longer term (post MVP phase) the aim is to enable public, unauthenticated access to https://datahub.wikimedia.org
The end goal is a bit like our current Grafana frontend authentication, I think. Can our Apero CAS system be used in such a configuration?
Perhaps @MoritzMuehlenhoff or @jbond would be able to advise

However, this unauthenticated, read-only role isn't yet available in DataHub, so we can't use it right now anyway. We have discussed the requirement with the upstream authors and they are keen to add the feature. Here is a related feature request about read-only roles, which is already planned for the near future.

So for the MVP phase we will keep it private and we would like users to be able to authenticate using their Wikimedia Developer Account
Anyone in either the wmf or nda LDAP groups should be able to access DataHub during this phase.

I think that I can do this without CAS-SSO by using a modified jaas.conf file a little like this:

WHZ-Authentication {
  # Authentication of the user
  com.sun.security.auth.module.LdapLoginModule required
  storePass="true"
  userProvider="ldaps://ldap-ro.eqiad.wikimedia.org:636/ou=people,dc=wikimedia,dc=org"
  authIdentity="{USERNAME}"
  userFilter="(&(objectClass=person)(uid={USERNAME}))"
  java.naming.security.authentication="simple"
  debug="false"
  useSSL="true";

  # Ensuring that this user is a member of either the wmf or nda groups
  com.sun.security.auth.module.LdapLoginModule required
  useFirstPass="true"
  userProvider="ldaps://ldap-ro.eqiad.wikimedia.org:636/ou=groups,dc=wikimedia,dc=org"
  authIdentity="{USERNAME}"
  userFilter="(&(member=uid={USERNAME},ou=people,dc=wikimedia,dc=org)(|(cn=wmf)(cn=nda)))"
  java.naming.security.authentication="simple"
  debug="false"
  useSSL="true";
};

(n.b. I haven't tested this yet.)

Here is more information about this LDAP setup in the frontend: https://datahubproject.io/docs/datahub-frontend/#authentication

I don't yet know how I would do this with CAS-SSO in this Kubernetes configuration and whether it's worth it, given the long-term aim for unauthenticated access.

An alternative mechanism for authentication is OIDC - which is outlined here: https://datahubproject.io/docs/how/auth/sso/configure-oidc-react
I see that Apero can act as an OIDC provider: https://apereo.github.io/cas/6.1.x/installation/OIDC-Authentication.html
...but I don't know whether or not that option is currently available to us.

I see that we can use Okta with OIDC: https://datahubproject.io/docs/how/auth/sso/configure-oidc-react-okta
...but I don't know how that could refer to our LDAP nda group.

If this is a temporary setup for the MVP phase and the end target is to make it publicly available, then I'd recommend to simply stick with LDAP auth. Looking at https://datahubproject.io/docs/metadata-ingestion/source_docs/ldap/ it seems simple to setup.

CAS supports OIDC (that would be another option), but at present we only use clients with the native CAS protocol (and SAML with limited scope), so OIDC would probably require some finetuning to our Puppet manifests. We can totally do that, but it's just for a time-limited MVP LDAP seems equally fine to use.

Great, thanks for that. I will proceed with the LDAP configuration for now.

BTullis renamed this task from Configure CAS-SSO authentication for the DataHub frontend to Configure LDAP authentication for the DataHub frontend.Mar 17 2022, 1:24 PM
BTullis updated the task description. (Show Details)

Change 778345 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Configure LDAP authentication for DataHub

https://gerrit.wikimedia.org/r/778345

Got the first LDAP enabled login working on the prototype (stat1008) as well as a CR to enable it for the MVP.

image.png (423×1 px, 58 KB)

Apr 07 20:41:22 stat1008 playBinary[27554]:                 [LdapLoginModule] authentication-only mode; SSL enabled
Apr 07 20:41:22 stat1008 playBinary[27554]:                 [LdapLoginModule] user provider: ldaps://ldap-ro.eqiad.wikimedia.org:636
Apr 07 20:41:22 stat1008 playBinary[27554]:                 [LdapLoginModule] attempting to authenticate user: btullis
Apr 07 20:41:23 stat1008 playBinary[27554]:                 [LdapLoginModule] authentication succeeded
Apr 07 20:41:23 stat1008 playBinary[27554]:                 [LdapLoginModule] added LdapPrincipal "uid=btullis,ou=people,dc=wikimedia,dc=org" to Subject
Apr 07 20:41:23 stat1008 playBinary[27554]:                 [LdapLoginModule] added UserPrincipal "btullis" to Subject

I have tried really hard to get the following filter to work to restrict access to the nda or wmf groups, but it currently isn't working.

userFilter="(&(objectClass=wikimediaPerson)(uid=btullis)(|(memberof=cn=nda,ou=groups,dc=wikimedia,dc=org)(memberof=cn=wmf,ou=groups,dc=wikimedia,dc=org)))"

The currently working configuration is only this, which permits anyone with a valid LDAP account to log in.

WHZ-Authentication {
  com.sun.security.auth.module.LdapLoginModule required
  userProvider="ldaps://ldap-ro.eqiad.wikimedia.org:636"
  authIdentity="uid={USERNAME},ou=people,dc=wikimedia,dc=org"
  debug="true";
};

Change 778345 merged by jenkins-bot:

[operations/deployment-charts@master] Configure LDAP authentication for DataHub

https://gerrit.wikimedia.org/r/778345

Change 779031 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Add the codfw LDAP server to the DataHub JAAS configuration

https://gerrit.wikimedia.org/r/779031

Change 779031 merged by jenkins-bot:

[operations/deployment-charts@master] Add the codfw LDAP server to the DataHub JAAS configuration

https://gerrit.wikimedia.org/r/779031

Change 779039 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Use the LDAP read-only replicas for datahub authentication

https://gerrit.wikimedia.org/r/779039

Change 779039 merged by jenkins-bot:

[operations/deployment-charts@master] Use the LDAP read-only replicas for datahub authentication

https://gerrit.wikimedia.org/r/779039

Change 779045 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Remove override for datahub-frontend staging egress

https://gerrit.wikimedia.org/r/779045

Change 779045 merged by jenkins-bot:

[operations/deployment-charts@master] Remove override for datahub-frontend staging egress

https://gerrit.wikimedia.org/r/779045

Thanks milimetric - that's really useful to know.

For now I don't think that we can use the CAS native protocol so we can't do it in the same way, but I think we have a way forward.
Having spoken to @MoritzMuehlenhoff about it, the best way is probably going to be to:

  1. wait a few weeks for the upgrade of CAS to version 6.5 ref: T305518
  2. enable OIDC server support in CAS
  3. switch the datahub authentication to OIDC with CAS backing it.

We can then get the LDAP groups filtering to work as we would like.

LDAP authentication is now working on the datahub staging deployment.

image.png (439×1 px, 44 KB)

This is now confirmed working in production.

image.png (586×1 px, 68 KB)

BTullis moved this task from Ready to Deploy to Done on the Data-Engineering-Kanban board.
BTullis removed a project: Patch-For-Review.

Change 791061 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Update the LDAP authentication for DataHub

https://gerrit.wikimedia.org/r/791061

I've updated the LDAP authentication so that it now correctly filters based on wmf or nda group membership.

Change 791061 merged by jenkins-bot:

[operations/deployment-charts@master] Update the LDAP authentication for DataHub

https://gerrit.wikimedia.org/r/791061