Page MenuHomePhabricator

Determine multi-dc strategy for CentralAuth
Closed, ResolvedPublic1 Estimated Story Points

Description

Background

MainStash is moving to a SQL cluster, per T212129: Move MainStash out of Redis to a simpler multi-dc aware solution. CentralAuth currently uses the same redis storage as MainStash (redis_local, configured via $wgCentralAuthSessionCacheType for CentralAuth and $wgMainStash for MainStash).

There was talk of decommissioning redis T243520: Decommission the "session redis" cluster. But it now seems that we will retain redis in a dc-local fashion for (at least) ChronologyProtector, per T254634: Determine and implement multi-dc strategy for ChronologyProtector. There's a bit more relevant discussion here

Current State

CentralAuth is using storage in 3 distinct ways (this is copied from T254422, with a little editing):

  • To store sessions. These are set with a key 'centralauth:session' and TTL_DAY. Important note, the TTL for session data in native sessions we just deployed is 1 hour.
  • To briefly transfer data in a secure way between wikis. With CentralAuth, your login request is redirected onto a central wiki, which, upon success, sets a token with a 1 minute TTL, then you're redirected back to your original wiki. It reads the token and transforms it into a local session.
  • Additionally, CentralAuth can issue API tokens, which are as well written with a 1 minute TTL. Consuming the token is done via setting a TTL to negative value.

These all use CentralAuthUtils::getSessionStore() . As mentioned above, this points at redis_local via $wgCentralAuthSessionCacheType.

In addition, incidental uses of BagOStuff instances within CentralAuth are:

SpecialCentralAutoLogin::getInlineScript() , for caching of minified javascript. This uses getLocalClusterInstance() and doesn't appear to need any changes.

Maintenance script populateListOfUsersToRename.php, which uses its own HasBagOStuff and doesn't need any changes.

Conclusion

Assuming that redis_local continues to represent an appropriate dc-local redis storage, no changes need to be made to CentralAuth when MainStash moves. If redis_local does change, we would need only a config change, to point $wgCentralAuthSessionCacheType to the appropriate dc-local redis. No change to the actual CentralAuth code would be necessary.

So although we discussed moving CentralAuth to kask in T254422: Move CentralAuth sessions from redis backend to kask, it appears that CentralAuth's needs can be satisfied by the same dc-local redis as ChronologyProtector, for less effort. In short, there appears to be nothing to do here.

If I am incorrect about that, please let me know so that we can make whatever changes are necessary.

(I'm tagging this with a bunch of stuff, for visibility. I'm probably overtagging, so feel free to untag anything not relevant.)

Related Objects

Event Timeline

For the record, I do not know CentralAuth well-enough off the top of my head to know whether it (as-is) would be compatible with a dc-local store.

What characteristics of the dc-local store should we be aware of? Specifically, how will that differ from the redis store that CentralAuth is currently using?

For the purposes of production without multi-dc, as we have today, they don't differ meaningfully. It is in fact 100% identical. CentralAuth physicall uses the old redis-sessions store, which as it stands is likely what we'd wire up to any ad-hoc dc-local needs for Redis such as ChronologyProtector and now CentralAuth.

However, thinking about multi-dc, I suspect CentralAuth has (like ChronologyProtector) been prepared for multi-dc since ~2015 and thus, like ChronologyProtector and user sessions, been coded in a way as if redis will somehow magically be a multi-dc friendly store with specific characteristics we imagined it to eventually get. Those imaginary characteristics have tweaked a bit and then embodied as Kask.

So by "making" CentralAuth use a dc-local redis in the scope of this multi-dc related task, I think, we need to treat this as if CentralAuth is migrating from Kask to dc-local Redis.

Put another way, we need to think about how CentralAuth is using this store, and how that plays with multiple DCs serving read traffic and if that can be satisifed by a dc-local store as-is or whether something needs to happen in the code to accomodate that reduction in consistency and guruantees.


I think the task analysis here is accurate in the context of "main stash is moving to SQL" - CentralAuth should be updated to point directly to dc-local redis (instead of indirectly via "Main Stash"), no other changes are needed.

In the context of multi-dc however, we need to think not just about how things work today but how they will work in the future when we are actively serving traffic from multiple data centers. Two different requests from the same user might not go to the same DC in that case, and thus changes everything.

Naike set the point value for this task to 1.Dec 10 2020, 2:17 PM

Based on conversation with @Krinkle and @Gilles this task is not blocking and what remains is documentation. We're going to try to redirect the documentation aspect through Clinic Duty as a learning opportunity.

In the task description, @BPirkle wrote:

[…]

  • To store sessions. These are set with a key 'centralauth:session' and TTL_DAY. Important note, the TTL for session data in native sessions we just deployed is 1 hour.

[…]

Assuming that redis_local continues to represent an appropriate dc-local redis storage, no changes need to be made to CentralAuth […]

I think this assessment is incorrect. We don't operate in multi-DC today, so what works today isn't an indicator of what is needed.

@Krinkle wrote:

In the context of multi-dc however, we need to think not just about how things work today but how they will work in the future when we are actively serving traffic from multiple data centers. Two different requests from the same user might not go to the same DC in that case, and thus changes everything.

When Redis was adopted by CentralAuth, it was with the expectation that it is replicated across data centers and thus apart from replication lag data written by either will be readable from both DCs after a short delay.

For years now, SRE has made clear that the Redis replication mechanism is unsuitable for this long-term, and is already cumbersome during switchovers, and has never been bi-directional. We just assumed we'd somehow make that work. We weren't entirely wrong, it was just realized by moving to Kask (for session storage), and the plan is to turn off replication for Redis in the multi-DC reality as its current configuration would lead to split-brain if we leave it replicating in one direction as it does today.

So given a multi-dc operation, what are we going to do with these CA session tokens? How are they going to be set upon login and then be readable from either data center? I don't see an answer to that here. From what I can tell, this will just break as-is. It is not solved, right?

For the core session store, we have solved for the replication lag with a "sticky DC" cookie (see T270225, and T91820). Thus we don't need Kask/Cassandra to generally wait for replication, because we pin the user to the same DC for a few seconds after a DB write to give replication a chance to catch up. However, this is seconds, not hours. (And we can't realistically do hours as otherwise there's little value in operating read traffic from multiple DCs if we pin all logged-in traffic to the primary).

With regards to the TTL, I don't know if 1 hour could work for CentralAuth. I suspect it would not. We certainly don't want users to have to login again after 1 hour of inactivity. I didn't realize it was 1 hour, but it may very well be the CentralAuth 1-day TTL that is making this work. Thus the only thing we lose after 1 hour is the temporary data for the user state on the local wiki. Their session cookie is linked with CentralAuth and a local session is naturally re-created on-demand upon seeing the user again after an hour has ellapsed.

I imagine that we also actively renew and push back the "1 hour" whenever the user is active. Noting that this is likely specific to a given wiki, so the user may be active on enwiki for a while but show no activity on other wikis, but remains logically logged-in there if they go there.

Questions:

  • How do our local and global sessions work currently? What makes it so that if I come back after two days to a wiki I haven't been on in a few days that I'm still (properly) logged in (e.g. not just triggering auto-login).
  • What data powers the auto-login? From talking with @aaron, he suspects the auto-login is powered by a DB field and limited to one value for one device; but the other aspects of it work fine for days/weeks on a single device.
  • How are each of the CentralAuth store keys used? Are they all only used set and get within POST requests to the primary DC and used no longer than the ~5 seconds we are sticky for? Assuming not, how will other DCs get the data?
  • If I live near Codfw and log in (handled by Eqiad), will CentralAuth need to read data on subsequent page views (handled by Codfw) that was written during the login process? Or more generally need to read data that was written during antoher web requests that could be POST reqs/edits handled by Eqiad?
    • In general the only expectation we have for dc-local Redis is that writes by the primary DC are seen by the primary DC during the first few seconds after the write. After that the data might not be visible since the user may geneerally be on Codfw, and dc-local Redis is standalone in each DC (will not have replication).

I think this assessment is incorrect. We don't operate in multi-DC today, so what works today isn't an indicator of what is needed.

Yeah, you're right. At some point, the question shifted in my mind from "what do we need to do for multi-DC?" to "what do we need to do right now to avoid imminent short-term breakage?". The multi-DC question is important, relevant, and unanswered.

I have not put real thought into the larger question, or even carefully read your analysis. For the moment, I just wanted to acknowledge where the disconnect in our perspectives occurred.

  • How do our local and global sessions work currently? What makes it so that if I come back after two days to a wiki I haven't been on in a few days that I'm still (properly) logged in (e.g. not just triggering auto-login).

A session ID is a valid credential for login until the session expires. A token (in the sense of User::getToken(), not in the sense used elsewhere in this task) is a valid credential until it is explicitly reset. The correct token is stored in the database. When you log in with "remember me", you get the token in a cookie. If you log in without "remember me", the token is stored in the session and is used to validate the session ID, so that if the token is reset, the session will become invalid.

All of this essentially applies to both core and CentralAuth, since CentralAuth was developed by analogy with core. A user logging in with "remember me" will get four cookies set by the local wiki: core session, core token, CA session, CA token. The main difference between core and CentralAuth cookies is that CentralAuth sets cookies with a wildcard domain. CentralAuth sessions and tokens are global, so they are valid for login regardless of what domain they are sent to, whereas core session and token cookies are only valid on the one wiki.

  • What data powers the auto-login? From talking with @aaron, he suspects the auto-login is powered by a DB field and limited to one value for one device; but the other aspects of it work fine for days/weeks on a single device.

There are two kinds of auto-login, both implemented in Special:CentralAutoLogin:

  • When you log in explicitly, there is automatic login to a large number of 2LDs. This is done by 1x1 image requests.
  • When you visit a wiki for the first time without a valid login, or when you visit Special:Userlogin, a scripted auto-login is initiated. This is done by cross-site <script src="...">. If it's successful, the page view version will pop up a notification, and the Special:Userlogin version will redirect to the referring page. If it fails, the failure is cached in localStorage.

In both cases, there is a handshake sequence with login.wikimedia.org. A centrally logged in user has cookies valid for login.wikimedia.org. This is verified and then the user is redirected back to the local wiki with a token (in the short TTL sense) in the URL. The local wiki validates the URL token and then sends cookies, establishing a local session.

  • How are each of the CentralAuth store keys used? Are they all only used set and get within POST requests to the primary DC and used no longer than the ~5 seconds we are sticky for? Assuming not, how will other DCs get the data?

The CentralAuth session is so similar to the core one that it should probably be in Kask, and be handled in the same way.

The whole auto-login sequence could be routed to the primary DC by path prefix. That's the simple solution.

The complex solution is to allow login.wikimedia.org requests to go to either DC. Then the redirect back to the local wiki would specify the DC in the URL. The frontend would route the subsequent request to the specified DC. Either way, the auto-login token would be stored in memcached with a short expiry.

Auto-login does not use POST requests. It's not useful to set a sticky DC cookie since the whole point is the lack of cookie sharing between the two domains.

I don't think it's desirable to replicate auto-login tokens. They are not a cache -- there is no way to deal with replication lag apart from waiting for it to be resolved.

  • If I live near Codfw and log in (handled by Eqiad), will CentralAuth need to read data on subsequent page views (handled by Codfw) that was written during the login process? Or more generally need to read data that was written during antoher web requests that could be POST reqs/edits handled by Eqiad?

Yes, codfw needs to read sesssion data. As I say, it is analogous to core sessions. If the user logs in to en.wikipedia.org without "remember me", you ideally want a stick DC cookie with an expiry of a few seconds for *.wikipedia.org. So that when the user goes to fr.wikipedia.org, their request is routed to eqiad until the central session is available on codfw. If the user shows up on codfw before session replication is complete, auto-login will be triggered.

@tstarling thank you for your in-depth response. Based on what you've just described, are the following items the minimum changes required to make CentralAuth work in an active-active setup?

  • All URIs involved in auto-login (what are they?) being routed to the primary DC
  • Issuing a sticky DC cookie with scope *.wikipedia.org when logging in without "remember me"
  • All URIs involved in auto-login (what are they?) being routed to the primary DC

The path prefixes are

  • /wiki/Special:CentralAutoLogin
  • /wiki/Special:CentralLogin

The names of these special pages are not localised -- there is a special arrangement to prevent localisation following T56195, which added a caching hack for one of these path prefixes. I don't think /w/index.php style URLs are used anywhere. I tested login and I don't see any, and the code always uses wfAppendQuery().

  • Issuing a sticky DC cookie with scope *.wikipedia.org when logging in without "remember me"

There are some details here to be figured out. Probably most callers of setCentralSession() will need to send such a cookie.

As an optimisation, special routing for those path prefixes could be skipped if there is no session or token cookie in the request. Anonymous auto-login is always going to fail and will account for most requests. The code would be the same as the existing special cases for session and token cookies in the Varnish and ATS configuration.

There are some details here to be figured out. Probably most callers of setCentralSession() will need to send such a cookie.

On second thoughts, I think the simplest thing to do here is to not set that cookie. It's a can of worms we don't need to open. Triggering an auto-login is not so bad, and the user will have to be pretty fast to hit that case anyway. Session changes in core don't cause a UseDC cookie to be sent, so when core has a solution for this, we can apply it to CentralAuth as well.

All we need for this to work is:

  1. A new Kask instance for $wgCentralAuthSessionCacheType
  2. A patch to CentralAuth to use Memcached instead of the session store for auto-login tokens, and an associated configuration patch
  3. Path prefix special cases

Change 671032 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/extensions/CentralAuth@master] [WIP] Split token and session cache type

https://gerrit.wikimedia.org/r/671032

Writing that patch forced me to properly review all current usages of session storage. Foreign API tokens also need DC pinning or some other special solution.

  • Action API foreign tokens
    • Retrieved by /w/api.php with action=centralauthtoken. The client (ext.centralauth.ForeignApi.js) ensures that action=centralauthtoken is in the URL, not in POST data.
    • Consumed by setting a centralauthtoken parameter. The client ensures that the parameter is in the URL, not in POST data.
    • A path pattern like ^/w/api.php?(.*&|)(centralauthtoken=|action=centralauthtoken) would work.
  • REST API foreign tokens.
    • Retrieved with the action API as above.
    • Consumed by sending an Authorization header which starts with CentralAuthToken.

So it all seems to be doable in the CDN.

Below from T270225 might be relevant, with regards to CentralAuth login flow:

The below is based on multi-dc strategy meeting notes (restricted) between SRE/PET/Perf, from a few months ago.

For the login request, which starts at a local wiki and is a local POST request, I believe the thinking was that the GET redirect to login.wikimedia.org works fine because we would pin that domain in its entirety to the primary DC ( (T91820; based on the assumption requests to this domain are not used in the critical path except for request that we'd want to route to the primary DC regardless). At the end of that redirect chain, it would end up on the local wiki. For db writes that would initially seem problematic as being a GET request, but the thinking was that the sticky DC request would last long enough to cover this chain, and so given it starts from a POST, we're good.

For background autologin from page views (not via login form), we uses a redirect chain of Ajax requests, which similarly go through login.wikimedia.org. That chain, however, also ends up at the local wiki domain touch, via a GET request, and I'm not exactly sure what we thought about that. If I had to improvise now, I'd say we could have the centralauth JS code set a shortlived sticky DC cookie proactively on the local domain at the start of the chain. However, this would be a new idea and previously it didn't feel like our strategy was incomplete, so I perhaps @aaron thought of another way already and I've just forgotten it?

[…] > All we need for this to work is:

  1. A new Kask instance for $wgCentralAuthSessionCacheType
  2. A patch to CentralAuth to use Memcached instead of the session store for auto-login tokens, and an associated configuration patch
  3. Path prefix special cases

1 and 3 make sense to me. I'd like to better understand 2, though. Given that point 2 makes sense to you, I'm probably wrong or too pessimstic, so feel free to contradict/englighten me – I've generally thought of Memcached as not expected to always accept writes (docs); e.g. nutcracker and mcrouter both induced artificial downtimes, TKOs, somewhat mitigated now by our gutter pool, though switching to/from gutterpool is not centrally coordinated and has a capped TTL of 10s). And I thought it was normal that values could in some cases be discard within a few minutes even, before first use, mainly because we (intentionally) operate Memc at full capacity so that lesser used stuff continuously drops out. More than anything else, I'm just asking to demystify this so that I can have more confidence in that. I am vaguely aware of the various different slabs and different storage areas Memcached allocates, but I've not come away from it thinking that it is "very likely" for single-use tokens to safely persist for N minutes in the Memc main cluster. Or did you envison a tiny separate memcached cluster for this?

Anyway, Memcached seems simple indeed, and no reason not to if that is good enough. If for some reason Memcached were not good enough, then here some thoughts that you can otherwise ignore:

  • After chatting with Gilles, I understand the reason we don't want to store tokens in the session store (Kask/Cassandra) is because that would mean each token is its own Cassandra "document" and would outlive its TTL due to lazy garage collection in a way that would operationally be problematic or risky. I don't know enough about that, but I can imagine that being a problem indeed. Would it help if the tokens were stored within CentralAuth's session blob? (Instead of as a a dedicated key, like today). Core's session blobs seem to contain various key-value pairs and tokens as well.
  • On the other hand, given that Memc isn't replicated, it seems implied that these tokens are perhaps only needed within the same DC? I think the previous strategy didn't guruantee that, but I guess with the various additional route exceptions you laid out, that could perhaps be guruanteed indeed. If so, then another option might be to keep using dc-local redis, as originally proposed, but only for these shor-lived tokens. That's what we ended up doing for ChronologyProtector (T254634).
  • Or, if we do need to replicate them or don't want the additional route exemptions, then aside from the "inside CentralAuth session blog" idea, another idea could be to use store them in the "MainStash" (currently Redis-backed, will soon switch to the new x2 bi-di MySQL; T212129). I don't think we imagined the (new) MainStash holding rapidly changing values, and I'm not sure how that'll play with SqlBag's gargage collector, but it might be "fine" as another option to consider.

I am vaguely aware of the various different slabs and different storage areas Memcached allocates, but I've not come away from it thinking that it is "very likely" for single-use tokens to safely persist for N minutes in the Memc main cluster. Or did you envison a tiny separate memcached cluster for this?

That depends on your definition of "very likely" doesn't it? We can use Redis or Kask for this instead if you like. The point of splitting it is because the requirements are very different from login sessions. Although maybe not very different from CSRF tokens. In any case, multi-DC Cassandra seems like overkill for a token with a half-life of 250ms.

Anyway, Memcached seems simple indeed, and no reason not to if that is good enough. If for some reason Memcached were not good enough, then here some thoughts that you can otherwise ignore:

  • After chatting with Gilles, I understand the reason we don't want to store tokens in the session store (Kask/Cassandra) is because that would mean each token is its own Cassandra "document" and would outlive its TTL due to lazy garage collection in a way that would operationally be problematic or risky. I don't know enough about that, but I can imagine that being a problem indeed. Would it help if the tokens were stored within CentralAuth's session blob? (Instead of as a a dedicated key, like today). Core's session blobs seem to contain various key-value pairs and tokens as well.

I think I might file a rant bug about just how bad session blobs are for consistency. The fact that we've always done this doesn't make it right.

  • On the other hand, given that Memc isn't replicated, it seems implied that these tokens are perhaps only needed within the same DC? I think the previous strategy didn't guruantee that, but I guess with the various additional route exceptions you laid out, that could perhaps be guruanteed indeed.

Yes, the theory is that it's easier to move requests to the primary DC than to correctly replicate data. Routing a request to the primary DC will have a latency impact comparable to using EACH_QUORUM or any other kind of backend cross-DC activity. We can solve a lot of problems this way. As long as we don't end up moving all logged-in page views back to the primary, the user will still win.

If so, then another option might be to keep using dc-local redis, as originally proposed, but only for these shor-lived tokens. That's what we ended up doing for ChronologyProtector (T254634).

Yes, Redis is fine. Please approve my CentralAuth patch because I'm really not trying to impose a storage policy, I'm just trying to make these two use cases separately configurable.

Change 671032 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Split token and session cache type

https://gerrit.wikimedia.org/r/671032