Page MenuHomePhabricator

Make sure that we're taking CentralAuth into consideration for staging release
Closed, ResolvedPublic

Description

For the staging configuration, are we also including CentralAuth sessions? Should we do that, or wait until we're going to production for Group 0? Or after we've done all the project sessions?

Event Timeline

OK, I'm not sure if I'm reading this right, but it seems like CentralAuth session store uses the same backend as the per-wiki session storage, but will throw an in-memory cache around it if it's not already caching. See here:

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CentralAuth/+/master/includes/CentralAuthUtils.php#166

This might suck, since at least as far as I can tell, CentralAuth hard-codes a 24-hour TTL on the session storage:

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CentralAuth/+/master/includes/CentralAuthUtils.php#266

The default per-wiki session TTL is one hour, and I don't think we override that anywhere. So I see five options for this:

  1. Evan is reading this code wrong or hasn't found the right code for how CentralAuth really works (very likely)
  2. We configure our per-wiki sessions to have 1-day TTL
  3. We let the CentralAuth sessions silently expire after 1 hour (which I think is the current per-wiki session timeout)
  4. We change the CentralAuth code so it uses a new configuration variable for its storage, and falls back to the per-wiki session storage if that new variable is not set
  5. We change Kask and RESTBagOStuff to support per-key TTLs

Of these, I prefer number 4, but I could live with 2 or 3. 5 is a major hassle, if I am not mistaken.

All of this is a wee bit complicated by the fact that CentralAuthCookieProvider sets the session cookie expiration to null (expire at end of browser sesssion):

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CentralAuth/+/master/includes/session/CentralAuthSessionProvider.php#352

...and instead uses the Token cookie, which can be 30 or 365 days (if the user says to remember them) according to our config.

Auth tokens are stored in the MySQL globaluser table, so they're kind of out-of-scope for session storage.

My rough impression is that the session storage TTL for CentralAuth is >> per-wiki session storage TTL because you might bounce around to different wikis, so it should feel like a continuous session? Sorry, not much more than that.

From what I can quickly tell, Evan's analysis is correct - if we were to move ahead with deploying the session storage change as-is, we would also affect storage for CentralAuth. I don't pretend to understand CentralAuth well enough to know what impact that might have. Some things I am concerned about are:

  • TTLS, for which all our quick solutions appear to be varying levels of bad
  • what happens to CentralAuth sessions if we (per T224993) make some production wikis use Kask while others do not? Does it still work? What might break?
  • do we really want CentralAuth and per-wiki sessions to share the same storage long-term, or do we prefer them to be separate?

I concur with Evan that the best answer appears to be changing CentralAuth to manage its own session storage through its own configuration variables. However, I wonder why it shared storage with the per-wiki sessions in the first place.

  • Was this simply because that storage was available and well-understood?
  • What it just simpler to code that way?
  • Or does CentralAuth intentionally use the same storage for reasons that I don't yet understand?

The real question here is: if we were to make CentralAuth use its own storage, would that somehow break something? I don't see how it would, but it is worth asking the question.

It is technically possible to make Kask and RESTBagOStuff support per-key TTLs. We explored that in some depth, and decided it was unpleasant. I would prefer to make CentralAuth use its own storage, because that intuitively feels like a better solution for multiple reasons. Then again, I have never touched CentralAuth code...

Our plans for a staging release of Kask (T222099) are on hold until we better understand all this. I think we all already know that, but I just want to explicitly state it.

Looping in @brion , because he is listed as the CentralAuth author at https://www.mediawiki.org/wiki/Extension:CentralAuth. Brion, if you are not the correct person to give us some guidance on this, are you able to direct us to the correct person?

I thought we agreed on per-request TTLs?

Tokens are only used if you check "Keep me logged in" at login, which people using e.g. a shared computer might be unwilling to do. Without that, the user will be logged out when the session expires. We renew the session on requests where they are close to expiry, so it's not too much of an issue (the CentralAuth logic would have to be updated to use a smaller renewal check).

it seems like CentralAuth session store uses the same backend as the per-wiki session storage, but will throw an in-memory cache around it if it's not already caching.

Per-wiki session storage does that as well.

what happens to CentralAuth sessions if we (per T224993) make some production wikis use Kask while others do not? Does it still work? What might break?

The login process involves the local wiki and loginwiki communicating through the same central session. So edge logins and autologins would break. Possibly logins would break altogether, although I don't think so.
Also you'd end up with a split-brain where different wikis see a different central session. Not sure what effects that would have; we don't store application data in the central session, so maybe not much.

do we really want CentralAuth and per-wiki sessions to share the same storage long-term, or do we prefer them to be separate?

They are both session data so presumably they need similar cross-DC characteristics?
It doesn't have to be the same storage per se, local and central sessions use different key namespaces anyway.

@Tgr , thank you for your reply.

I thought we agreed on per-request TTLs?

TTL discussion is here https://phabricator.wikimedia.org/T222907

I wouldn't be surprised if we revisit TTLs in the future, but at this time it seemed unnecessary and we weren't able to come up with an implementation we were happy with. So we deferred.

what happens to CentralAuth sessions if we (per T224993) make some production wikis use Kask while others do not? Does it still work? What might break?

The login process involves the local wiki and loginwiki communicating through the same central session. So edge logins and autologins would break. Possibly logins would break altogether, although I don't think so.
Also you'd end up with a split-brain where different wikis see a different central session. Not sure what effects that would have; we don't store application data in the central session, so maybe not much.

This is very good to know. We will adjust accordingly.

In looking back over session storage tasks, I came across this relevant comment by @Eevans in T222907:

FYI, I think CentralAuth needs a TTL different to that of regular sessions (IIRC, this was cited as a reason for exposing the TTL). Regardless of whether or not this is true, we should probably consider using a different instance of the service for CentralAuth (it's trivial to do). If we do, we can configure it with a different default TTL as well.

Right now, I'm thinking:

  1. modify CentralAuth, allowing it to be configured to use different storage than per-wiki sessions
  2. deploy a configuration change to explicitly point CentralAuth where it is already pointed, rather than piggybacking off per-wiki session storage configuration.
  3. make sure we didn't break anything - the only change at this point should be that CentralAuth session storage and per-wiki session storage configurations are decoupled.
  4. proceed with our original rollout plan, moving only per-wiki session storage to Kask, not CentralAuth
  5. after per-wiki session storage for all production wikis is moved to Kask, consider moving CentralAuth to a separate Kask instance under its own Phab ticket

I am happy to take on the CentralAuth changes, but given that I have other tasks that I've also committed to, this would delay our schedule.

Change 521404 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/extensions/CentralAuth@master] Allow separating CentralAuth and per-wiki session storage.

https://gerrit.wikimedia.org/r/521404

Change 521409 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[operations/mediawiki-config@master] Specify CentralAuth session storage separately from per-wiki session storage.

https://gerrit.wikimedia.org/r/521409

I have pushed two Gerrit changes:

  1. a change to CentralAuth that introduces a new optional configuration variable for specifying the CentralAuth session storage location
  2. a configuration change that uses the new variable to specify the CentralAuth session storage location, pointing it to the same place that it is already pointed

It does not matter what order these deploy in. If the change to CentralAuth deploys first, then the new variable will not be set and CentralAuth will fall back to using the existing $wgSessionCacheType variable, with no change in behavior. If the config change deploys first, it simply sets a new variable that no code will use, with no change in behavior.

Once both changes are deployed, the new code will read the new variable. But because the new variable points to the same location as the existing variable, there will be no change in behavior.

The end result is that all production wikis should continue working unchanged, but because configuration for CentralAuth and per-wiki session storage will have been decoupled, we will be able to start the Kask rollout plan, beginning with a staging server, per T222099.

So, per our discussion this morning, the rollout plan will be:

  1. Make CentralAuth session storage configurable.
  2. Switch group 0 per-wiki session storage to Kask and test.
  3. Switch group 1..N per-wiki session storage to Kask.
  4. Switch CentralAuth session storage to Kask.

This is a good plan. Thanks, @BPirkle !

Change 521404 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Allow separating CentralAuth and per-wiki session storage.

https://gerrit.wikimedia.org/r/521404

@EvanProdromou We should also take this into account during Main Stash

I think we've taken this into account, so I'm going to close this task.

We still need the config change to be merged/deployed.

Change 521409 merged by jenkins-bot:
[operations/mediawiki-config@master] Specify CentralAuth and OAuth session storage separately from per-wiki session storage.

https://gerrit.wikimedia.org/r/521409

Mentioned in SAL (#wikimedia-operations) [2019-07-30T23:13:20Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Specify CentralAuth and OAuth session storage separately from per-wiki session storage (T227097, T227696) (duration: 00m 47s)

The config change has now been merged and deployed.