Page MenuHomePhabricator

Synchronize SUL2 and SUL3 central browser state
Closed, ResolvedPublic

Description

SUL2 uses login.wikimedia.org; SUL3 will use a different central domain per T363695: Create a Wikimedia login domain that can be served by any wiki. That means during rollout, when we switch a user from SUL2 to SUL3, they would lose their central session (though not any of their already existing local sessions). Which is not a huge deal but it would be nice to avoid by copying the SUL2 domain session to the SUL3 domain session. That could also be used to prevent the two domains ending up with sessions for different users and causing weird behavior if e.g. we don't roll out on all wikis at the same time (again not a huge deal but nice to have).

There are two (hopefully) easy ways to do it:

  • Use central autologin - make sure that when we are doing edge login in SUL2 mode, the SUL3 domain is added to the list of autologin domains (and the relevant endpoint is enabled on the SUL3 SSO domain). Maybe the same in the other direction as well. Make sure edge login is triggered (or maybe just wait long enough).
  • Use central login - instead of triggering it after (SUL2) login, find a way to trigger it when the user has a local session but no SUL3 central session.

The first seems both simpler and safer, although more likely to be prevented by browser restrictions (as edge login uses subresource requests for cookie access while central login uses top-level ones).

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@polishdeveloper pointed out that we'll need this for temp users, otherwise we'd just discard all temp accounts when we switch.

@Tgr, it looks to me like solving T375788: Implement SUL3 central autologin will naturally resolve this task since the sso domain will be part of the list of domains we perform autologin on? SUL2 sessions will be copied to the SUL3 domain once it's added to the list.

DAlangi_WMF changed the task status from Open to In Progress.Oct 14 2024, 12:23 PM
DAlangi_WMF claimed this task.

It will solve it for future logins / signups. But we need some way to copy the login state for existing temp accounts. That might be as simple as automatically triggering edge login for temp accounts older than today, but something needs to be done.

DAlangi_WMF changed the task status from In Progress to Open.Feb 10 2025, 12:21 PM

There are actually three ways to synchronize state using the existing central login / central autologin infrastructure, without having to do something brand new:

  1. When doing central login, do it for both domains (login.wikimedia.org and auth.wikimedia.org) instead of just one; or, in the case of SUL3 login (which would normally have no separate central login), do a central login for login.wikimedia.org.
  2. Add the "passive" SUL domain (ie. login.wikimedia.org in SUL3 mode, auth.wikimedia.org in SUL2 mode) to the list of edge login domains (CentralAuthHooks::getAutoLoginWikis()).
  3. Check both domains during autologin (including top-level autologin).

Option #2 is not terribly useful (it needs a successful edge login to take effect, so it's very slow to roll out) but also very trivial to do, and for the subset of users for whom an edge login does happen, it makes autologin slightly faster (no need to do two checks sequentially). So we might as well do it, but it's not enough on its own.

Option #1 would have to be done ahead of time, like #2 (since it only takes effect during central login), and it would have to be done for people who are not opted into SUL3 yet, which is not great from a risk management perspective. (The same is true for #2 but #2 is a very trivial change and #1 isn't.)

So I think the clear winner here is option #3, and we can probably do #2 in addition to that.

Change #1120594 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] [WIP] Add passive central domain to edge login list

https://gerrit.wikimedia.org/r/1120594

Change #1120594 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Add passive central domain to edge login list

https://gerrit.wikimedia.org/r/1120594

That implements option 2. I assume you're planning to work on option 3 next?

Yeah. This was more urgent because it runs during login, while option 3 runs during autologin.

Change #1124785 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.18] Add passive central domain to edge login list

https://gerrit.wikimedia.org/r/1124785

Change #1124785 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.18] Add passive central domain to edge login list

https://gerrit.wikimedia.org/r/1124785

rECAU7e294b3bd96b: Add passive central domain to edge login list doesn't seem to work, the assertIsLocalDomain() check gets confused somehow.

Change #1126696 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] Try both SUL2 and SUL3 central domain for autologin

https://gerrit.wikimedia.org/r/1126696

rECAU7e294b3bd96b: Add passive central domain to edge login list doesn't seem to work, the assertIsLocalDomain() check gets confused somehow.

Specifically, if the user did a SUL2 login, they will edge login on https://auth.wikimedia.org/XXX/wiki/Special:CentralAutoLogin/start?useformat=desktop&type=1x1&from=XXX&usesul3=0 (since the usesul3 parameter reflects the user's opt-in status). The usesul3 parameter doesn't have much effect on autologin, but it does affect central domain checks (since it does affect what's considered to be the central domain). So we either need usesul3=1 for the passive domain, or we need assertIsCentralDomain() to accept both central domains.

I am getting that backwards - usesul3=0 is correct, it means the SUL2 central domain should be used to copy the session from. This is happening after a SUL2 login (which is why it's using usesul3=0) so that's the correct thing to do. The actual error is x-centralauth-status: Is central wiki, should be local during the first leg (https://auth.wikimedia.org/enwiki/wiki/Special:CentralAutoLogin/start?useformat=desktop&type=1x1&from=enwiki&usesul3=0) so it's actually the opposite: auth.wikimedia.org is seen as the central wiki, despite usesul3=0.

The logic check triggering that warning is

$loginWiki = $this->config->get( CAMainConfigNames::CentralAuthLoginWiki )
	?? $this->fallbackLoginWikiId;
return ( !$this->sharedDomainUtils->isSul3Enabled( $request ) && WikiMap::getCurrentWikiId() === $loginWiki )
	|| ( $this->sharedDomainUtils->isSul3Enabled( $request ) && $this->sharedDomainUtils->isSharedDomain() );

which is failing because 1) we have set $wgCentralAuthLoginWiki to null on the shared domain to prevent accidental autologins / edge logins (in hindsight maybe not the best idea), 2) fallbackLoginWikiId is the wiki in the from parameter (the idea being that in SUL1 mode, when you don't have a central login wiki, you can use the from wiki as the source to copy the session from) which happens to be the same as the current wiki ID, since the idea of SUL3 authentication is that the local and shared domain have the same wiki ID.

In the opposite scenario, when the user does a SUL3 login and gets sent to https://login.wikimedia.org/wiki/Special:CentralAutoLogin/start?useformat=desktop&type=1x1&from=enwiki&usesul3=1 to set a session on the SUL2 domain, is working correctly.

Change #1127648 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Fix some SUL3 shared domain settings

https://gerrit.wikimedia.org/r/1127648

Change #1126696 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Try both SUL2 and SUL3 central domain for autologin

https://gerrit.wikimedia.org/r/1126696

Change #1127952 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.20] Try both SUL2 and SUL3 central domain for autologin

https://gerrit.wikimedia.org/r/1127952

Change #1127952 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.20] Try both SUL2 and SUL3 central domain for autologin

https://gerrit.wikimedia.org/r/1127952

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:19:55Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1127952|Try both SUL2 and SUL3 central domain for autologin (T375796)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:23:37Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1127952|Try both SUL2 and SUL3 central domain for autologin (T375796)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1128502 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.20] Re-apply "Try both SUL2 and SUL3 central domain for autologin"

https://gerrit.wikimedia.org/r/1128502

Change #1128502 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.44.0-wmf.20] Re-apply "Try both SUL2 and SUL3 central domain for autologin"

https://gerrit.wikimedia.org/r/1128502

Mentioned in SAL (#wikimedia-operations) [2025-03-17T22:53:13Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1128502|Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-17T22:57:11Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1128502|Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-17T23:39:58Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1128502|Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796)]] (duration: 46m 45s)

rECAU7e294b3bd96b: Add passive central domain to edge login list doesn't seem to work, the assertIsLocalDomain() check gets confused somehow.

This was fixed, but it's still not working - as @matmarex pointed out (forgot where), we don't really have a mechanism for doing an edge login on the passive central domain. Edge login is supposed to go

(current wiki) -> (target wiki)/Special:CentralAutoLogin/start -> (central wiki)/Special:CentralAutoLogin/checkLoggedIn -> (target wiki)/Special:CentralAutoLogin/createSession -> (central wiki)/Special:CentralAutoLogin/validateSession -> (target wiki)/Special:CentralAutoLogin/setCookies

so for edge login on the passive domain that's (e.g.)

https://en.wikipedia.org/ -> https://auth.wikimedia.org/enwiki/wiki/Special:CentralAutoLogin/start -> https://login.wikimedia.org/Special:CentralAutoLogin/checkLoggedIn -> https://auth.wikimedia.org/enwiki/wiki/Special:CentralAutoLogin/createSession -> https://login.wikimedia.org/Special:CentralAutoLogin/validateSession -> https://auth.wikimedia.org/enwiki/wiki/Special:CentralAutoLogin/setCookies

but the way that redirect chain remembers where to go is passing a wikiid parameter to the steps which happen on the central domain, and there is no way to specify auth.wikimedia.org with a wiki ID. So the /start step will pass wikiid=enwiki, and thus in reality the fourth step of that redirect chain will be https://en.wikipedia.org/wiki/Special:CentralAutoLogin/createSession rather than https://auth.wikimedia.org/enwiki/wiki/Special:CentralAutoLogin/createSession, and similarly the final step will be https://en.wikipedia.org/wiki/Special:CentralAutoLogin/setCookies, and the cookies will be set on the current wiki rather then the shared domain.

(There's another bug here, although inconsequential, where the URL should be https://auth.wikimedia.org/loginwiki not https://auth.wikimedia.org/enwiki. That's a straightforward logic error in CentralDomainUtils::getUrl().)

I'm on the fence on wether this is worth fixing, or we should just remove the whole passive central domain concept. It was always meant to be a quick hack that's not strictly needed (as rECAUaabc92e7d26f: Try both SUL2 and SUL3 central domain for autologin is enough to synchronize between the two central domains) and is just a minor performance improvement. The fix doesn't seem hard though - I think we can just pass PASSIVE_CENTRAL_DOMAIN_ID as the wikiid parameter.

Maybe I can write the patch for that last case, since I have it fresh in my mind, and to make sure I understand how to work with this system after all the recent changes.

Maybe I can write the patch for that last case, since I have it fresh in my mind, and to make sure I understand how to work with this system after all the recent changes.

Thanks, but I think we ran out of the time window where it would have value. rECAUaabc92e7d26f: Try both SUL2 and SUL3 central domain for autologin will be useful for up to a year as users who have not yet done any login or other edge-login-triggering action since the SUL rollout do autologins. But once we finish login rollout (which is two days if all goes well), the passive domain will always be loginwiki, and creating a session on it during edge login won't really have any point.

matmarex reassigned this task from matmarex to Tgr.

That makes sense. In this case I think this work is done.

Still need to remove the broken passive central domain edge login.

And we should do it quickly because it might have contributed to {T390514} (or at least easy to rule out by getting rid of it).

Change #1133262 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] SUL3: Remove passive central domain logic

https://gerrit.wikimedia.org/r/1133262

We might also want to swap the central autologin lookup order of loginwiki and auth.wikimedia.org at some point.

We might also want to swap the central autologin lookup order of loginwiki and auth.wikimedia.org at some point.

Created a stub task so we don't forget this: T391284: Swap order of central autologin lookup for loginwiki and shared domain.

As a further step, should SUL2 be disabled in Wikimedia completely?

As a further step, should SUL2 be disabled in Wikimedia completely?

I think we should keep it for a few weeks at least for ease of testing of changes. To some extent we'll need to keep it as long as central autologin falls back to loginwiki. Plus we need to keep B/C for the login/signup APIs for quite a long time; right now we are doing that via the SUL2 / SUL3 opt-in mechanism (there might be a better way, not sure).

There's also the question of what to keep supporting for local development setups and third-party wikis. I think we should support SUL2 and SUL3 (i.e. login locally vs. login on the central domain) for the next release (which is nowish) only, with the ability to switch via configuration between them, and support both an actual wiki and a shared domain as the central domain indefinitely (as setting up a shared domain is more complex).

Change #1133262 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] SUL3: Remove passive central domain logic

https://gerrit.wikimedia.org/r/1133262