Page MenuHomePhabricator

Allow login to JupyterHub via CAS
Open, MediumPublic

Description

Currently, JupyterHub users must ssh tunnel into a specific stat box, navigate to localhost:8000, and then authenticate with JupyterHub via LDAP.

There is a JupyterHub CAS Authenticator: https://github.com/cwaldbieser/jhub_cas_authenticator

I'm testing out this functionality now, but it isn't clear what is needed to get this authentication working with CAS (or if we should).

We don't yet have a public domain for JupyterHub; we'd have to set one up. In addition, there are multiple JupyterHub servers, one on each stat box. Should we set up a separate domain for each one? Can we somehow route via a proxy URL from a single domain, e.g. jupyterhub.wikimedia.org/stat1008/hub/login?

Should we bother doing this at all?

Related Objects

StatusSubtypeAssignedTask
ResolvedOttomata
OpenNone

Event Timeline

Ottomata triaged this task as Medium priority.Aug 17 2020, 3:25 PM
Ottomata moved this task from Incoming to Data Exploration Tools on the Analytics board.

I'm testing out this functionality now, but it isn't clear what is needed to get this authentication working with CAS

Looking at the documentation it seems you need the following configuration

apereo_cas can be found in hiera e.g. $apereo_cas = lookup('apereo_cas')

c.CASAuthenticator.cas_login_url = $apereo_cas['production']['login_url']
c.CASAuthenticator.cas_client_ca_certs = '/etc/ssl/certs'
c.CASAuthenticator.cas_service_validate_url = $apereo_cas['production']['validate_url']

The following would also need to be set to the specific ldap group used to authenticate to JupyterHub e.g. nda?

c.CASAuthenticator.cas_required_attribs = {('memberOf', 'jupyterhub_users')}

We would also need to create a public endpoint for juniperhub and route it correctly (as mentioned) once that is done we would know the value for c.CASAuthenticator.cas_service_url

Finnaly we need to create a service definition on the cas service e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/610708

(or if we should).

I think @elukey or @Ottomata are probably better placed to answer that

We don't yet have a public domain for JupyterHub; we'd have to set one up. In addition, there are multiple JupyterHub servers, one on each stat box. Should we set up a separate domain for each one? Can we somehow route via a proxy URL from a single domain, e.g. jupyterhub.wikimedia.org/stat1008/hub/login?

I think both of theses are possible however i don't know the difference between the various JupyterHub servers to know what makes the most senses so again will leave this one to luca and otto

Thanks @jbond, I'll leave this as a low/medium priority one for now and discuss with Luca when he gets back. I'm working on a new JupyterHub setup that should allow us to sort of easily swap out the current LDAP auth for CAS later.

And, @jbond there's no current 2FA (e.g. google Authenticator app?) support for CAS (yet), right? I think Luca and I both would feel more comfortable if there was some extra security on this one. Jupyter basically allows users to do anything they could do on the shell.

Thanks @jbond, I'll leave this as a low/medium priority one for now and discuss with Luca when he gets back. I'm working on a new JupyterHub setup that should allow us to sort of easily swap out the current LDAP auth for CAS later.

SGTM

And, @jbond there's no current 2FA (e.g. google Authenticator app?) support for CAS (yet), right? I think Luca and I both would feel more comfortable if there was some extra security on this one. Jupyter basically allows users to do anything they could do on the shell.

Currently we support u2f HW tokens and we can enable it for accounts by adding the correct LDAP attribute. Fruther when adding the service to CAS we can configure it in a way to only authorise users to the service if they authenticate with 2fa.

Regarding google auth/TOTP specificly, we did test it during the prototyping phase and it should be easy to enable again however there is an admin overhead we need to consider untill we have a proper account manager portal to the IDP.

tagging @MoritzMuehlenhoff for comment when he returns from vacation

Thanks @jbond, I'll leave this as a low/medium priority one for now and discuss with Luca when he gets back. I'm working on a new JupyterHub setup that should allow us to sort of easily swap out the current LDAP auth for CAS later.

As for the questions about routing requests these would probably be separate end points, after all the current method of accessing a stat host also sets up a tunnel to a specific server.

And, @jbond there's no current 2FA (e.g. google Authenticator app?) support for CAS (yet), right? I think Luca and I both would feel more comfortable if there was some extra security on this one. Jupyter basically allows users to do anything they could do on the shell.

Currently we support u2f HW tokens and we can enable it for accounts by adding the correct LDAP attribute. Fruther when adding the service to CAS we can configure it in a way to only authorise users to the service if they authenticate with 2fa.

Yeah, this can simply be enabled for all interested users (and at some point we'll most probably make 2FA mandatory as it allows access to a lot of sensitive data)

Regarding google auth/TOTP specificly, we did test it during the prototyping phase and it should be easy to enable again however there is an admin overhead we need to consider untill we have a proper account manager portal to the IDP.

Most folks don't have HW tokens (yet, or will?) so I guess we can't use CAS for this until we have this proper account manager for the IDP?

Regarding google auth/TOTP specificly, we did test it during the prototyping phase and it should be easy to enable again however there is an admin overhead we need to consider untill we have a proper account manager portal to the IDP.

Most folks don't have HW tokens (yet, or will?) so I guess we can't use CAS for this until we have this proper account manager for the IDP?

You already can, it's opt-in, see: https://wikitech.wikimedia.org/wiki/CAS-SSO#Enabling_U2F_as_a_second_factor

OIT/ITS sends Yubikeys to all staff on request (and I think even by default for new hires?)

Hm, I guess we could enable CAS + HW 2FA, but keep the ssh tunnel support for users without the HW?

Hm, I guess we could enable CAS + HW 2FA, but keep the ssh tunnel support for users without the HW?

That'll unlikely work on the same installation (unless maybe one stat server is kept in the tunneled setup).

Hm, I guess we could enable CAS + HW 2FA, but keep the ssh tunnel support for users without the HW?

I like this option, seems feasible. The only thing that I am worried about is that exposing the jupyterhub UI to the internet will allow users to make changes to HDFS via the UI, as opposed to now that the max damage that a user can do is dumping data . I would rely on httpd + mod_cas for the authentication part if possible, not relying on jupyterhub's one (that seems less stable/secure in my opinion, a bug is less likely to happen in httpd).

httpd + mod_cas for the authentication part if possible, not relying on jupyterhub's one

Hm, makes sense. Might be easier to set up too.

The new Okta dashboard looks swell !! now I was wondering... if only I had Jupyter notebooks on there and SSO to stat boxes, it would be even more amazing!
just wanted to give a shoutout to @Ottomata for initiating this (though its been long!). pls let us know if we Product-Analytics can support this in any way. thanks!