Page MenuHomePhabricator

Add SSO support to netbox
Closed, ResolvedPublic

Description

I have been looking at adding cas authentication to netbox and it seems that external authentication sources other then ldap are not really supported.  There is a pull request to add saml support[1] as well as an issue[2] which lists some other implementations to add saml.  Further there is no support for handing of authentication to a front end proxy using http headers, Although this seems to be the direction the the netbox community is planning on moving towards[3].  There is also work to add a plugin system[4] which could potentially be utilised to add saml [or other third party] authentication. 

Either way it doesn't look like either of theses will be supported in the near term future and wanted to canvas opinion on a way forward.  The patches to add either saml or authentication via http headers (the prefered way forward) both seem to be relatively simple however it does mean patching the netbox code. Is this something that is sane to consider. It seems that netbox is already distributed via scap, dose that mean we have already made modifications? If so should i just add modifications there. Or would it be better to distribute our modifications via some other method i.e. puppet.

The other thing to mention is that it is currently not clear how authentication mappings would work. I believe all the examples just map users to a hard-coded group or what ever the netbox default is. I have not seen any which can map to specific netbox roles. I wonder how much of an issues this would cause us and how difficult it would be to add some type oof role mapping

[1]https://github.com/netbox-community/netbox/pull/3010
[2]https://github.com/netbox-community/netbox/issues/1677
[3]https://github.com/netbox-community/netbox/issues/2328
[4]https://github.com/netbox-community/netbox/issues/3351

Event Timeline

jbond triaged this task as Medium priority.Feb 11 2020, 12:34 PM

Our current LDAP setup for Netbox is [1], see AUTH_LDAP_USER_FLAGS_BY_GROUP for the current mapping.

And if in Netbox you click on the top-right icon with a human-shape and then click to admin you can see all the users and groups mirrored into Netbox from ldap. Technically is possible to go there and set specific permissions for each Netbox view/action for each group, although we're not using that feature right now.

Technically right now we're running a fork of netbox as you can see from the last 4 commits in [2], but we're quite behind upstream and @crusnov is working on the upgrade on T244291, so that situation might change soon, not sure how many, if any, local patch will remain.
That said yes, Netbox is deployed via scap through a netbox-deploy repo and is technically possible to have local modifications, although not recommended in the long term.
How complex would be to add Cas support (with one of the existing Django-Cas modules)?

For full context take also into account that there was an attempt to made a part of Netbox publicly accessible, see [3].

[1] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/netbox/templates/ldap_config.py.erb
[2] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox/+log/HEAD
[3] https://gerrit.wikimedia.org/r/c/operations/puppet/+/526686

How complex would be to add Cas support (with one of the existing Django-Cas modules

I have not looked specificity at adding a django-cas module however the saml example above uses django_saml2_auth and the external authentications method uses django.contrib.auth.backends.RemoteUserBackend so i imagine i would be able to use those examples to add a dajango_cas module. however at the all hands we sort of agreed that we should try and use an external authentications proxy such as apache so that:

  • we have a common implementation and workflow
  • only authenticated users get access to the underlining app (reduce exposure to vulnerabilities in the app or framework)

For full context take also into account that there was an attempt to made a part of Netbox publicly accessible, see

Im not familiar enough with the uri layout to know if this would prevent using apache as the authentication frontend or not, i.e. only enable authentication based on Location

On a practical level we already maintain a fork, so if any changes are needed they can be integrated into our fork (we should wait until the post-upgrade ~this week though).

On a practical level we already maintain a fork, so if any changes are needed they can be integrated into our fork (we should wait until the post-upgrade ~this week though).

We should not maintain a fork permanently. Backporting patches from newer releases/master, or applying patches that we've pushed upstream is fine, but permanently deviating from the upstream codebase is not.

On a practical level we already maintain a fork, so if any changes are needed they can be integrated into our fork (we should wait until the post-upgrade ~this week though).

We should not maintain a fork permanently. Backporting patches from newer releases/master, or applying patches that we've pushed upstream is fine, but permanently deviating from the upstream codebase is not.

Of course! I may have oversold it, in reality we maintain two patches that we forward port on top of upstream (which has thusfar been trivial). One is a change to a part of the configuration that isn't exposed by their 'smart' config setup to make swagger UI access less open, and the other is allows arbitrary configuration to be injected that we use for injecting Swift storage backend configuration (which will go away soonish as upstream is designing their own solution for this) and to inject a configuration that makes it so Django respects the HTTP-FORWARDED-PROTO header which was the cause of numerous API woes.

My meaning is more that if changes are required to settings module, or other aspects of the system, we can perform those changes either by injecting configuration or by maintaining additional patches. Upstream is super careful about introducing changes outside of their vision of how things should work so I feel like to more or less extent we'll be pushing these patches forward on our own, but it's all meant to be a minimal intervention.

Some notes from conversations about this:

  • We are in general agreement as to using apache to query CAS and then use headers to communicate user and group to the external authentication provider. This has the double good of using what upstream seems to be leaning towards in supporting authentication backends.
  • There may be issues with groups in this scenario. @jbond Presumably the apache module sets group headers? Can you fill us in a bit on that, so that we can ask upstream to support it.
  • We may have to defer this setup until CAS is more widely adopted since Netbox is consumed by non-SREs and not a beta-testable environment. Upstream's external auth proposal doesn't say much about supporting both external and internal auth simultaneously, which also might lead to a few complexities testing (although it's reasonable to use the secondary frontend with CAS while the primary frontend remains traditional).

There may be issues with groups in this scenario. @jbond Presumably the apache module sets group headers? Can you fill us in a bit on that, so that we can ask upstream to support it.

Currently CAS sets a bunch of HTTP headers which allow the down stream app to preform additional mapping. Below is a copy of the headers cas-puppetboard currently revives when i login, so the HTTP_X_CAS_MEMBEROF header would make the most senses.

HTTP_X_CAS_LONGTERMAUTHENTICATIONREQUESTTOKENUSED: false
HTTP_X_CAS_BYPASSMULTIFACTORAUTHENTICATION: false
HTTP_X_CAS_SAMLAUTHENTICATIONSTATEMENTAUTHMETHOD: urn:oasis:names:tc:SAML:1.0:am:password
HTTP_X_CAS_ISFROMNEWLOGIN: false
HTTP_X_CAS_AUTHENTICATIONMETHOD: LdapAuthenticationHandler
HTTP_X_CAS_AUTHENTICATIONDATE: 2020-03-02T13:01:25.888466Z
HTTP_X_CAS_SUCCESSFULAUTHENTICATIONHANDLERS: LdapAuthenticationHandler
HTTP_X_CAS_AUTHNCONTEXTCLASS: mfa-u2f
HTTP_X_CAS_MFA_METHOD: mfa-u2f
HTTP_X_CAS_CN: Jbond
HTTP_X_CAS_MAIL: jbond@wikimedia.org
HTTP_X_CAS_MEMBEROF: cn=cn=project-puppet-diffs,ou=groups,dc=wikimedia,dc=org,cn=mfa-enabled,ou=groups,dc=wikimedia,dc=org,cn=project-deployment-prep,ou=groups,dc=wikimedia,dc=org,cn=project-cloudinfra,ou=groups,dc=wikimedia,dc=org,cn=project-tools,ou=groups,dc=wikimedia,dc=org,cn=project-automation-framework,ou=groups,dc=wikimedia,dc=org,cn=ops,ou=groups,dc=wikimedia,dc=org,cn=project-sso,ou=groups,dc=wikimedia,dc=org,cn=wmf,ou=groups,dc=wikimedia,dc=org,cn=project-bastion,ou=groups,dc=wikimedia,dc=org,cn=project-monitoring,ou=groups,dc=wikimedia,dc=org,cn=project-puppet,ou=groups,dc=wikimedia,dc=org,cn=project-testlabs,ou=groups,dc=wikimedia,dc=org,cn=project-traffic,ou=groups,dc=wikimedia,dc=org
HTTP_CAS_USER: jbond
HTTP_X_CAS_CREDENTIALTYPE: UsernamePasswordCredential

We can also use theses entries to set other headers or environment variables by using mod_rewrite (although there is a small bug in mod cas which can conflict with mod rewrite so this may not be as simple as it should be)

We may have to defer this setup until CAS is more widely adopted since Netbox is consumed by non-SREs and not a beta-testable environment.

Who are the non-SRE consumers?

Upstream's external auth proposal doesn't say much about supporting both external and internal auth simultaneously, which also might lead to a few complexities testing (although it's reasonable to use the secondary frontend with CAS while the primary frontend remains traditional).

My intenttion would be to only use external authentication but im not familure with any let alon all the edge cases to know if this is viable

Yeah I agree HTTP_X_CAS_MEMBEROF has all the info needed.

Who are the non-SRE consumers?

Currently anyone in WMF has read only access to Netbox. As for usual consumers surely Bryan from WMCS and Jeff in FR-Tech, and I'm sure other occasional ones.

No idea if it's useful here but came across https://github.com/jeremyschulman/netbox-plugin-auth-saml2

forgot to respond to this, yes this is useful thanks @ayounsi

There is also https://djangocas.dev

I have reassigned this task to myself as it has fallen on my plate to push forward. I believe pursuing in the way that the Netbox documentation suggests (using django auth modules) may be more future-proof.

Change 666957 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] O:idp: add netbox as an authorised servie

https://gerrit.wikimedia.org/r/666957

Change 666957 merged by Jbond:
[operations/puppet@production] O:idp: add netbox as an authorised servie

https://gerrit.wikimedia.org/r/666957

Change 667109 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] O:idp: fix service pattern match for netbox

https://gerrit.wikimedia.org/r/667109

Change 667109 merged by Jbond:
[operations/puppet@production] O:idp: fix service pattern match for netbox

https://gerrit.wikimedia.org/r/667109

I've experimented with SSO on netbox-next and been reading a lot of code, and this is an update on all of that.

There are a few ways to do this, obviously, and they all have upsides and downsides to achieve what we want out of it; the main sticking point in every existing off-the-shelf system is they aren't in parity with the features of the LDAP authentication stack that Netbox currently supports, *primarily* being able to map user groups into user flags. All solutions require some level of patch, but the tradeoffs are interesting.

solutionwork neededupsidesdownsides
RemoteUserBackendneed to add group->flag mapping, group assignmentmostly already built indoes not have capacity to map groups to user flags yet, requires we rewrite the request headers to the headers its expecting, some routing for api needs to be added to apache
django-auth-cas-ngrequires views and middleware, need to modify login/logout view to direct to cas views for these when cas is enabled, some code needed for mapping groups and groups->flagstested, works for authenticating, simplish, netbox still handles authentication pipelineneeds a bit of code to support
django-saml2-authSame as above, except SAML2 insteadSAML2 complexity on top of above complexity
RemoteUserBackend but mod_casMake a child of the RemoteUserBackend stuff to consume the CAS request headerslots of example on how to do this in the existing codebaseall of the dowsides of RemoteUserBackend except renaming the variables
netbox-plugin-auth-saml2Would need to add configuration for mapping groups to user flags.is pretty off the shelf, except for changes neededauthentication is at apache layer, so routing would have to be implemented partially in apache for api. brings the complexity of SAML2 for little gain

Overall, I would very much prefer to implement a patch against the code base to have first class CAS support through django-auth-cas-ng; I did a partial hot patch to test it, and the code was relatively easy; supporting our needed group-flag mapping should be a relatively straight forward project, and I feel like short of major changes to these flows it'd be a maintainable solution. I'm pretty averse to letting Apache handle this (eg, RemoteUserBackend variations, netbox-plugin-auth-saml2) just because it seems like a mistake to have to mirror some routing rules in the Apache configuration so that api can work. I'm also slightly averse to using SAML2 when we can natively support CAS, and it seems more able to be reasoned about.

I'll be putting out a patch to support the django-auth-cas-ng solution in the next few days for us to look at. Please let me know if I've missed anything significant in this calculus.

I think the big question is whether it's realistic to upstream support for group-> user flag mappings to the RemoteUserBackend. We have had the case of Grafana were upstream reserves such a feature for the Enterprise/non FLOSS tier, which should not be a concern for Netbox, but there could be other blockers (lack of resources, desire to reduce complexity in an auth backend etc.).

If that seems possible, going that route with mod_cas seems elegant and in comparison to django-auth-cas-ng we'd need less local patches to carry, but django-auth-cas-ng also seems like a fine approach.

Given the current situation with Netbox upstream [1] don't expect any new feature accepted/merged within a short timeframe.

[1] https://github.com/netbox-community/netbox/discussions/5853

Thanks for the analysis this all looks good to me, one note

re: RemoteUserBackend requires we rewrite the request headers to the headers its expecting

I think we can also do this in apache2. There is also a bit of flexibility to to this with the apache cas_auth module so depending on the details this may be simple to do achieve with out code modification.

Update on progress: I discussed the possibilities and situation with @jbond, with the idea that adapting RemoteUserBackend was the general consensus of the above discussion.

I have made the patch to support this by allowing the backend to be configured to process a named header into groups and group-flag mappings (similar to how the LDAP backend does this) and am testing it on -dev. The idea is that this could be adapted by upstream if they desire but it is a relatively simple change which will be straight forward to maintain. The downside from my perspective is that it moves authentication to Apache so the puppet patch will necessitate some extra routing for /api.

Change 668574 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox@master] Add enhanced RemoteUserBackend

https://gerrit.wikimedia.org/r/668574

Just a note that i think we will be able to re-use this code in debmonitor

Change 668753 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] netbox, profile::netbox: Switch to CAS authentication

https://gerrit.wikimedia.org/r/668753

Change 668574 merged by CRusnov:
[operations/software/netbox@master] Add enhanced RemoteUserBackend

https://gerrit.wikimedia.org/r/668574

@crusnov @jbond
I did some quick testing on netbox-next and I have some questions:

  1. Where are the logging messages saved?
  2. I removed the wmf group, then I tried the following:
    • Refresh page: I was still logged in.
    • Login from a new incognito window: I was able to login
    • Re-created the wmf group in the Netbox Django Admin page
    • Login from a new incognito window: I was able to login but the wmf group has still 0 users in it.
  3. What happens if the LDAP groups of a user change? Let's say someone that is ops and then moves to another team and is not part anymore of the ops LDAP group? Would it be removed from the group?

@crusnov @jbond
I did some quick testing on netbox-next and I have some questions:

  1. Where are the logging messages saved?

It is using the standard logging framework, so they should be saved where other log messages are saved.

  1. I removed the wmf group, then I tried the following:
    • Refresh page: I was still logged in.
    • Login from a new incognito window: I was able to login
    • Re-created the wmf group in the Netbox Django Admin page
    • Login from a new incognito window: I was able to login but the wmf group has still 0 users in it.

This is expected behavior. If your user already exists it never goes through the configure_user step again. AFAIK this is consistent with the LDAP module but if it isn't we could extend it to synchronize the groups.

  1. What happens if the LDAP groups of a user change? Let's say someone that is ops and then moves to another team and is not part anymore of the ops LDAP group? Would it be removed from the group?

They would not, see above.

IIRC the LDAP auth re-syncs the groups all the time because needs to be sure the user reflects their current groups, to ensure that their access is consistent with their real groups and doesn't grant/revoke permissions because of a missing/stale group. But please double check this assumption.

IIRC the LDAP auth re-syncs the groups all the time because needs to be sure the user reflects their current groups, to ensure that their access is consistent with their real groups and doesn't grant/revoke permissions because of a missing/stale group. But please double check this assumption.

After reviewing the LDAP authentication library I can say that you're correct, it does do this in its backend. If this is a required functionality it can be added to our current authentication backend in a way similar to how LDAP does it.

After reviewing the LDAP authentication library I can say that you're correct, it does do this in its backend. If this is a required functionality it can be added to our current authentication backend in a way similar to how LDAP does it.

I think that it's required to avoid the security issue of a user removed from an LDAP group keeping the previous access and the usability issue of a user that was added to a more privileged group that will not gain the expected privileges.
@jbond thoughts?

Change 672548 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox@master] Group Sensitive Remote: Sync groups at auth time rather than creation time

https://gerrit.wikimedia.org/r/672548

After reviewing the LDAP authentication library I can say that you're correct, it does do this in its backend. If this is a required functionality it can be added to our current authentication backend in a way similar to how LDAP does it.

I think that it's required to avoid the security issue of a user removed from an LDAP group keeping the previous access and the usability issue of a user that was added to a more privileged group that will not gain the expected privileges.

Ack, patch uploaded.

I think that it's required to avoid the security issue of a user removed from an LDAP group keeping the previous access and the usability issue of a user that was added to a more privileged group that will not gain the expected privileges.
@jbond thoughts?

Yeah, that would be good to fix, when staff gets offboarded they keep their developer account, but get stripped from the cn=wmf LDAP group.

So we discussed this at the automation meeting, and it turns out we've all agreed that the current code and patches need to be thrown out entirely and the project redone with the django-auth-cas-ng solution because of the Logout Problem. Basically this is an unrealized problem involving mod_cas keeping sessions separately from CAS or Django. Each of these keep their own sessions, and even if the django or the CAS sessions get invalidated, the mod_cas session lives on, and re-creates the Django session. Rather than kludging it to invalidate this session also, removing the apache layer seems like a better fix.

So the plan is to throw out the current patches, and create a new patch which nativizes django-auth-cas-ng (involving some of the changes discussed above to overload login/logout). This patch would be of similar size to the RemoteUser change but probably is not upstream-acceptable since it will involve deeper changes to views and things, and is very specific to this authentication system.

Change 672548 abandoned by CRusnov:
[operations/software/netbox@master] Group Sensitive Remote: Sync groups at auth time, not creation time

Reason:
This change will be superseded by a CAS-SSO native solution.

https://gerrit.wikimedia.org/r/672548

Change 672831 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox@master] Add CAS authentication support

https://gerrit.wikimedia.org/r/672831

As agreed on IRC assigning to John to not loose momentum on this.

Change 698796 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:netbox: Add support for cas authentication provider

https://gerrit.wikimedia.org/r/698796

Change 698807 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] O:netbox::standalone: switch netbox-next to use cas authentication

https://gerrit.wikimedia.org/r/698807

Change 698792 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/software/netbox-deploy@master] cas: add cas_configuration symlink

https://gerrit.wikimedia.org/r/698792

Change 698792 abandoned by Jbond:

[operations/software/netbox-deploy@master] cas: add cas_configuration symlink

Reason:

https://gerrit.wikimedia.org/r/698792

Change 698792 restored by Jbond:

[operations/software/netbox-deploy@master] cas: add cas_configuration symlink

https://gerrit.wikimedia.org/r/698792

Change 698796 merged by Jbond:

[operations/puppet@production] P:netbox: Add support for cas authentication provider

https://gerrit.wikimedia.org/r/698796

Change 698807 merged by Jbond:

[operations/puppet@production] O:netbox::standalone: switch netbox-next to use cas authentication

https://gerrit.wikimedia.org/r/698807

Change 672831 merged by Jbond:

[operations/software/netbox@master] Add CAS authentication support

https://gerrit.wikimedia.org/r/672831

Change 698792 merged by Jbond:

[operations/software/netbox-deploy@master] cas: add cas_configuration symlink

https://gerrit.wikimedia.org/r/698792

Change 698962 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf2

https://gerrit.wikimedia.org/r/698962

Change 698962 merged by Volans:

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf2

https://gerrit.wikimedia.org/r/698962

Change 698986 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf3

https://gerrit.wikimedia.org/r/698986

Change 698986 merged by Volans:

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf3

https://gerrit.wikimedia.org/r/698986

Change 699188 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf4

https://gerrit.wikimedia.org/r/699188

Change 699188 merged by Jbond:

[operations/software/netbox-deploy@master] Update to v2.10.4-wmf4

https://gerrit.wikimedia.org/r/699188

This is in place now

Thank you @jbond for picking this up and sheperding it - appreciate it!