Page MenuHomePhabricator

SUL3 Phase 1: All new account creation on group 0 and group 1 wikis
Closed, ResolvedPublic

Description

Enable SUL3 for all new account creation on group 0 wikis and - if there's enough time - roll out gradually to group 1 wikis; and opt-in users who create a new account.
(See full rollout plan here.)

Related Objects

StatusSubtypeAssignedTask
ResolvedTgr
ResolvedBUG REPORTTgr
ResolvedTgr
DeclinedFeatureTgr
ResolvedDAlangi_WMF
ResolvedArielGlenn
ResolvedArielGlenn
ResolvedDAlangi_WMF
ResolvedTgr
ResolvedTgr
ResolvedBUG REPORTTgr
ResolvedBUG REPORTTgr
ResolvedDAlangi_WMF
ResolvedBUG REPORTDAlangi_WMF
ResolvedTgr
ResolvedBUG REPORTTgr

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2025-02-26T14:58:36Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1120968|CentralAuth: Enable SUL3 signup on group 0 (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Other interesting things seen while testing:

There's a bunch of User::loadFromSession called before the end of Setup.php in the logs. The stack trace is Setup.php autocreate -> User::addToDatabase() -> SaveUserOptions hook -> GetBetaFeaturePreferences hook -> Vector's hook handler trying to get the active skin, which in turn depends on the user. Probably not SUL3 specific and just an issue with autocreation? Also, probably harmless (at worst, some preference gets set to a wrong default). I wonder why User::addToDatabase() has to save the options, though.

There's also some Expectation (readQueryRows <= 10000) by MediaWiki\Actions\ActionEntryPoint::execute not met. Known issue, apparently: T349511

There's also a bunch of broken writes <= 0 and masterConns <= 0 expectations, they seem to be all over the place (site stats, watchlist, Echo, CheckUser). They all happen on Special:CentralAutoLogin. I guess these are autocreation side effects, and will be suppressed by the multi-dc rule update.

Change #1123032 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 2)

https://gerrit.wikimedia.org/r/1123032

Group 0 has about 250 signups per day (logstash). The smaller edge login wikis have dozens of autocreations per day. So an extra 250 is a bit of a flood but not a huge deal for a couple days. Also, these are by default hidden from the logs / recent changes. So on reflection I don't think T387357 is a blocker for group 0.

Change #1123032 merged by jenkins-bot:

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 2)

https://gerrit.wikimedia.org/r/1123032

Mentioned in SAL (#wikimedia-operations) [2025-02-26T21:45:59Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1123032|CentralAuth: Enable SUL3 signup on group 0 (attempt 2) (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-02-26T21:49:03Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1123032|CentralAuth: Enable SUL3 signup on group 0 (attempt 2) (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-02-26T22:03:57Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1123032|CentralAuth: Enable SUL3 signup on group 0 (attempt 2) (T384007)]] (duration: 17m 57s)

Unfortunately two of the edge login wikis post automated talk page messages on autocreate, so right after registration the user is faced with some confusing messages which link to other wikis which the user probably doesn't recognize or care about (and where the user is not logged in).

Screenshot from 2025-02-26 23-09-52.png (919×1 px, 147 KB)

That's poor UX but I don't think it's worth an emergency revert. Can just disable tomorrow if we decide we don't want this confusing behavior until autocreation is fixed.

In one of my tests I ended up logged out after a successful registration (but then top-level autologin worked). Not sure if that's a SUL3 bug that happens infrequently, or just generic authentication fragility. I thought I saw a log message about session metadata conflict, but now I can't find it.

Unfortunately two of the edge login wikis post automated talk page messages on autocreate

Example Commons talk page message, example Incubator talk page message.
At least these messages don't show up in recent changes, so that doesn't get spammed on small wikis.

In one of my tests I ended up logged out after a successful registration (but then top-level autologin worked). Not sure if that's a SUL3 bug that happens infrequently, or just generic authentication fragility. I thought I saw a log message about session metadata conflict, but now I can't find it.

Maybe related to T158365: Session "{session}": Metadata merge failed: {exception}

Yeah it was one of those. But that class of error is (AIUI) usually related to cookie conflicts, and this was in an incognito browser so I think that wouldn't have been possible.

Change #1123312 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 2)"

https://gerrit.wikimedia.org/r/1123312

Change #1123312 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 2)"

https://gerrit.wikimedia.org/r/1123312

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:13:47Z] <ladsgroup@deploy2002> Started scap sync-world: Backport for [[gerrit:1123312|Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 2)" (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:18:18Z] <ladsgroup@deploy2002> tgr, ladsgroup: Backport for [[gerrit:1123312|Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 2)" (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:25:57Z] <ladsgroup@deploy2002> Finished scap sync-world: Backport for [[gerrit:1123312|Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 2)" (T384007)]] (duration: 12m 10s)

Change #1123807 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 3)

https://gerrit.wikimedia.org/r/1123807

Change #1123807 merged by jenkins-bot:

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 3)

https://gerrit.wikimedia.org/r/1123807

Mentioned in SAL (#wikimedia-operations) [2025-03-04T22:26:38Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1123807|CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-04T22:29:35Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1123807|CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Still blocked on T387357: SUL3 signup results in autocreation on all edge login domains - due to the cross-wiki nature of edge login, need to wait until all wikis have the new version of the code.

Tgr renamed this task from SUL3 Phase 1: All new account creation on group 0 wikis to SUL3 Phase 1: All new account creation on group 0 and group 1 wikis.Mar 5 2025, 12:12 PM
Tgr updated the task description. (Show Details)

Change #1124757 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 4)

https://gerrit.wikimedia.org/r/1124757

Change #1124757 merged by jenkins-bot:

[operations/mediawiki-config@master] CentralAuth: Enable SUL3 signup on group 0 (attempt 4)

https://gerrit.wikimedia.org/r/1124757

Mentioned in SAL (#wikimedia-operations) [2025-03-05T16:56:35Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1124757|CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-05T16:59:34Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1124757|CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-05T17:20:48Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1124757|CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007)]] (duration: 24m 13s)

Change #1124865 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Clean up SUL3 config

https://gerrit.wikimedia.org/r/1124865

Change #1124866 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Roll out SUL3 signup to 1% of users on most group 1 wikis

https://gerrit.wikimedia.org/r/1124866

I think we broke autologin for SUL3 users. Edge login and top-level autologin start with a usesul3 URL parameter, but autologin doesn't (due to caching constraints). The idea was that the /start step adds that parameter (T375788#10270771) but we never actually implemented that, and in hindsight it's not a very good plan - the /start step is on the local wiki and does not know whether the user has SUL3 enabled, since the cookie that would tell that (sul3OptIn in that comment, but we ended up using UserName instead) is probably not set on that domain.

I don't think this is bad enough to roll back, but we should fix it soon. Since per T375796: Synchronize SUL2 and SUL3 central browser state we want to try both SUL2 and SUL3 autologin anyway, maybe we can just leave it at that. It would be nice to try SUL3 autologin first if the user is opted into SUL3, and vice versa, but probably not worth the effort of having to figure out how to split the cache.

That also means we should probably revert rECAU3b628d0b69b0: Point autologin resourceloader module URL to `/start` endpoint - we can just use checkLoggedIn, always use it in SUL2 mode (we can flip that once SUL3 is deployed to most users) and have it fall back to the other mode.

Change #1124865 merged by jenkins-bot:

[operations/mediawiki-config@master] Clean up SUL3 config

https://gerrit.wikimedia.org/r/1124865

Mentioned in SAL (#wikimedia-operations) [2025-03-05T23:17:11Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1124860|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124861|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124865|Clean up SUL3 config (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-05T23:20:05Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1124860|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124861|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124865|Clean up SUL3 config (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-05T23:36:04Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1124860|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124861|Preserve usesul3 flag during autologin (T375788)]], [[gerrit:1124865|Clean up SUL3 config (T384007)]] (duration: 18m 53s)

Change #1124866 merged by jenkins-bot:

[operations/mediawiki-config@master] Roll out SUL3 signup to 1% of users on most group 1 wikis

https://gerrit.wikimedia.org/r/1124866

I think we broke autologin for SUL3 users.

It's working fine now, so I guess it was just a side effect of the bug that rECAUc50c27e300ea: Preserve usesul3 flag during autologin fixed.

Mentioned in SAL (#wikimedia-operations) [2025-03-05T23:39:42Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1124866|Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-05T23:42:41Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1124866|Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-06T00:08:55Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1124866|Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007)]] (duration: 29m 13s)

Change #1125130 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Enable SUL3 signup for 10% of group 1 users

https://gerrit.wikimedia.org/r/1125130

Change #1125134 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Enable SUL3 signup for all group 1 users

https://gerrit.wikimedia.org/r/1125134

Change #1125130 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable SUL3 signup for 10% of group 1 users

https://gerrit.wikimedia.org/r/1125130

Mentioned in SAL (#wikimedia-operations) [2025-03-06T16:03:32Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1125130|Enable SUL3 signup for 10% of group 1 users (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-06T16:06:15Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1125130|Enable SUL3 signup for 10% of group 1 users (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-06T16:17:42Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1125130|Enable SUL3 signup for 10% of group 1 users (T384007)]] (duration: 14m 10s)

There's also a bunch of broken writes <= 0 and masterConns <= 0 expectations, they seem to be all over the place (site stats, watchlist, Echo, CheckUser).

Split to a separate task: T388165: "Expectation not met" warnings during SUL autologin autocreation

Change #1125134 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable SUL3 signup for 50% of group 1 users

https://gerrit.wikimedia.org/r/1125134

Mentioned in SAL (#wikimedia-operations) [2025-03-06T22:42:26Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-06T22:45:07Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-06T23:03:21Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1125134|Enable SUL3 signup for 50% of group 1 users (T384007)]] (duration: 20m 55s)

I saw two CentralAuthSessionProvider::provideSessionInfo: token mismatch messages while testing on mwdebug (1, 2). Should probably look into those later.

Change #1126131 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Enable SUL3 signup for all of group 1 and 1% of group 2 users

https://gerrit.wikimedia.org/r/1126131

Change #1126131 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable SUL3 signup for all of group 1 and 1% of group 2 users

https://gerrit.wikimedia.org/r/1126131

Mentioned in SAL (#wikimedia-operations) [2025-03-10T21:32:54Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1126131|Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-10T21:35:44Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1126131|Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-10T21:48:15Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1126131|Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218)]] (duration: 15m 21s)

Tgr claimed this task.

I saw two CentralAuthSessionProvider::provideSessionInfo: token mismatch messages while testing on mwdebug (1, 2). Should probably look into those later.

Filed as T388476: CentralAuthSessionProvider::provideSessionInfo: token mismatch for {username}.