There are situations where it's needed that we logout a user centrally and promptly, e.g. because a laptop was lost. With the adoption of Apereo CAS as your identity provider access to most web services services can be centrally revoked via Single-Sign Logout (SLO). There are however also use cases beyond CAS-enabled web services that in the future we also want to support, e.g. terminating a user's SSH session or logging them off from ttys on the serial console. As such, we'll need a mechanism will is flexible and extensible.
The eventual design/setup will look like this:
- We create some central service logout directory, e.g. /etc/wikimedia/logout.d fleet-wide. This directory contains executable scripts (which can be written in an arbitrary language, so for the most common case Python). They are managed with Puppet (or we could also not use recurse/purge and also allow scripts shipped via debs). For 99% of all cases order won't matter, but we can establish that they are executed in alphabetical order and use a scheme like 50-foo, 50-bar, 99-baz.
- Logout scripts expect the following "API":
# {} mutually exclusive # [] optional $0 {logout,query} --user <uid> --cn <cn> [all]
(Along with possible aliases like -u or -c)
Since some services operate on the UID and some on the CN/Wikimedia Developer Name, the cookbook will simply pass both and the script can use what it needs. logout returns 0 and no output if a user has been successfully logged out or if the user wasn't logged in to begin with. In case there's an error, it's non-zero and an arbitrary error message is returned. query returns 1 is the user is logged in and 0 if not.
- One central/initial logout script will be /etc/wikimedia/logout.d/cas which will be present on the primary IDP host. Once executed by the cookbook, it'll detect the user's current CAS TGT and post an HTTP DELETE to https://idp-test.wikimedia.org/api/ssoSessions/$TGT which logs out the user from all active CAS sessions. The CAS SLO request is a "fire&forget", so if there are transient issues (like a 5xx, brief network blip) on the hosts which are meant to be logged out, then these won't be repeated. That's a conceptual issue and will happen rarely, but we can adddress it by running "query" again after the initial "logout" run and offering to re-run failed scripts.
- Other services where the CAS logout doesn't work ATM (currently only LibreNMS) can deploy a /etc/wikimedia/logout.d/librenms which e.g. restarts Apache (which should terminate the session as well). In some cases we will also need to terminate the users's session in mod_cas via the SLO call, but also need to void some internal state in the backend to really log off the user.
- The cookbook would simply traverse /etc/etc/wikimedia/logout.d/* with the "logout" action and passing CN/UID. Initially it can simply run fleet-wide, but we can also make it smarter by preparing some targets depending on the level of access a user has (SREs with global access or e.g. researchers or so). In addition there would be a cookbook (or a flag to the logout one) which only runs "query", which detects where/if a user is currently logged in. This way we also decouple the cookbook from the service logouts (since these might change more often and we don't want to update the cookbook all the time)
- Longterm we can also use the logout script framework to provide logouts for individual services or fleet-wide, so that with something like https://idp-logout.wikimedia.org/grafana/$USER a user can call the SLO cookbook or trigger the service-specific logout (something for later with a more detailed design once the general logout logic is established).