Page MenuHomePhabricator

Move CentralAuth sessions from redis backend to kask
Closed, DeclinedPublic1 Estimated Story Points

Description

In case we are to decommission Redis, we need to switch CentralAuth to Kask.

CentralAuth is using the store in 3 distinct way:

  • To store sessions. These are set with a key 'centralauth:session' and TTL_DAY. Important note, the TTL for session data in naive sessions we just deployed is 1 hour.
  • To briefly transfer data in a secure way between wikis. With centralauth, as far as I understand, your login request is redirected onto a central wiki, which, upon success, sets a token with a 1 minute TTL, then you're redirected back to your original wiki. It reads the token and transforms it into a local session.
  • Additionally, centalAuth can issue API tokens, which are as well written with a 1 minute TTL. Consuming the token is done via setting a TTL to negative value. I assume this is done because of the BagOStuff interface - delete doesn't give you information on whether the item existed or not, while changeTTL does - this allows to aboid race conditions (it assumes atomicity, which BagOStuff implementations doesn't seem to completely guarantee) but in general it is indeed could be atomic, while get/delete is a clear case of a race.

Given that Kask was engineered to NOT provide the opportunity to set TTL in the request, the idea was always to create separate clusters of kask for each use-case - thus configure TTL in kask itself. Which leaves us at a crossroads: we could implement per-request TTL in kask, perhaps with a default maximum TTL, or we could split CentralAuth use-cases into 2 new clusters - for sessions and for tokens. This would require quite some coding on MW side, but nothing too crazy.

The question of consuming tokens via effectively a CAS operation is an even bigger issue, but I would like to spend some more time investigating it before making any proposals.

Event Timeline

Krinkle renamed this task from Replace redis backend with kask for CentralAuth to Move CentralAuth sessions from redis backend to kask.Jul 10 2020, 5:36 PM

In my KeyValueStore refactor patches, a consume() method was added, since implementing it with TTL changes is not generically the most convenient way to best-effort atomically consume a key.

Naike set the point value for this task to 1.Sep 11 2020, 9:33 AM

I also recall that we planned to use multiple kask instances, each with their own TTL. Is there a compelling reason to go a different direction? I think I recall @Eevans saying that standing up additional kask instances would not be difficult, but I personally have no knowledge of whether that remains true, or how to go about it. Who can speak to this, and would that same person or persons be available to create the new kask instance(s) if we go that route?

Everything else in this comment assumes we're using multiple kask instances.

I see CentralAuth creating BagOStuff instances in:

CentralAuthUtils::getSessionStore(), which seems to be the main usage:
https://gerrit.wikimedia.org/g/mediawiki/extensions/CentralAuth/+/c18a80cd5a62243ea8d526e820b6c9cb1f0deae4/includes/CentralAuthUtils.php#166

SpecialCentralAutoLogin::getInlineScript(), for caching of minified javascript:
https://gerrit.wikimedia.org/g/mediawiki/extensions/CentralAuth/+/c18a80cd5a62243ea8d526e820b6c9cb1f0deae4/includes/specials/SpecialCentralAutoLogin.php#43

Maintenance script populateListOfUsersToRename.php, which I guess doesn't need changed:
https://gerrit.wikimedia.org/g/mediawiki/extensions/CentralAuth/+/c18a80cd5a62243ea8d526e820b6c9cb1f0deae4/maintenance/populateListOfUsersToRename.php#69

Maybe we add a function in addition to CentralAuthUtils::getSessionStore() for getting a kask instance with a different TTL, maybe CentralAuthUtils::getTokenStore(). Presumably, we'd need additional configuration as well.

I'm guessing we'd need to plan a migration via MultiWriteBagOStuff, similar to what we did for the initial session rollout.

Quote from Slack attributed to @Krinkle:

I’m not aware of the ca token cache having special needs. Either it can go into the session object with its regular TTL if it needs to be multi-dc (even if longer than today, doesn’t seem significant) or go the route of ChronologyProtect and keep using the dc-local redis main stash.

(That was said it to someone else that passed it along to me, and not said to me directly, so it'd be great if @Krinkle can confirm here.)

There's some relevant discussion on T252391: Reimage one memcached shard per DC to Buster starting here

The ChronologyProtector plan is discussed in more detail on T254634: Determine and implement multi-dc strategy for ChronologyProtector

With these in mind, maybe we should decline this ticket and instead plan to move CentralAuth to dc-local storage?

@BPirkle I know you're hoping more for input from @Krinkle than me :) But I would agree that moving it to the dc-local storage would likely make the most sense.

For the record, I do not know CentralAuth well-enough off the top of my head to know whether it (as-is) would be compatible with a dc-local store. It's possible that it already is, or that it would be trivial to accomodate. It might also be hard or near-impossible. I simply don't know.

If others more familiar with it have indicated this already, then feel free to ignore me.

For the record, I do not know CentralAuth well-enough off the top of my head to know whether it (as-is) would be compatible with a dc-local store.

What characteristics of the dc-local store should we be aware of? Specifically, how will that differ from the redis store that CentralAuth is currently using?

This might be better asked on T267270: Determine multi-dc strategy for CentralAuth, so feel free to either answer there, or else I'll copy over any replies to that task for completeness.

This comment was removed by Krinkle.

Closing this as declined because it seems we need to take a step back and consider our approach, and to not split discussion between this task and T267270: Determine multi-dc strategy for CentralAuth.