Page MenuHomePhabricator

"Keep me logged in" flag unreliable on the central domain
Open, Needs TriagePublic

Description

CentralAuth uses two primitives for single sign-on: central login (Special:CentralLogin), which happens at the end of login/signup and copies the user's session from the local domain where they logged in to the central domain; and central autologin (Special:CentralAutoLogin), which happens when you first visit a wiki where you are not logged in, and it tries to copy the session from the central domain to the local domain (ie. log you in locally based on your previous central login). To a first approximation, central autologin uses subresource redirects which makes it unreliable (Safari and Firefox often block cookies during such redirects, Chrome might start doing so soon), while central login uses top-level redirects so it's fairly reliable (with the possible exception of Safari, no browser messes with cookies on top-level redirects today).

Except, the final step in central login is actually a subresource request to Special:CentralAutoLogin/refreshCookies, which for weird internal reasons is necessary to set the user tokens on the central login domain. If this request fails (and it probably does in Firefox etc), the central cookies will behave as if the "keep me logged in" checkbox ("remember me" session flag) hasn't been set - they will expire in 30 days instead of a year, and they will become invalid if the user doesn't make any request to the wiki farm for a 24 hour timespan (and so the central session expires in the session store).

This has been known for a while but we didn't really care. With the central session being more crucial for temp users, maybe we should. We could change the final step of central autologin to be two top-level redirects instead of one subresource request - that would make login and signup slower by a few hundred milliseconds, but it would also make the central session more reliable.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Another thing we could do is automatically fix the central cookies during the next top-level autologin.

kostajh subscribed.

This has been known for a while but we didn't really care. With the central session being more crucial for temp users, maybe we should. We could change the final step of central autologin to be two top-level redirects instead of one subresource request - that would make login and signup slower by a few hundred milliseconds, but it would also make the central session more reliable.

That seems like a good tradeoff.

This has been known for a while but we didn't really care. With the central session being more crucial for temp users, maybe we should. We could change the final step of central autologin to be two top-level redirects instead of one subresource request - that would make login and signup slower by a few hundred milliseconds, but it would also make the central session more reliable.

That seems like a good tradeoff.

@Tgr @larissagaulia is this something your team could work on before our next round of pilot wiki deployments in February/March?

(moving back on workboard for request visibility)

In an ad-hoc test, the start/complete steps of central login take me 300ms each, and the refreshCookies step takes 200ms. No idea how representative these numbers are.

We could change the final step of central autologin to be two top-level redirects instead of one subresource request

So that would add a refresh step that's between 200-300ms (I assume a top-level redirect is a bit slower than a subresource request because the browser has to unload the page etc), and a fast final step that just redirects the user back and otherwise doesn't do anything (~200ms? pretty much any index.php request takes at least that much for me), so maybe an extra half second during login.

Another thing we could do is automatically fix the central cookies during the next top-level autologin.

That would be free performance-wise, but you would need a top-level autologin. Ie. it only helps if the user visits another wiki family within 24 hours and clicks "login" there.

In an ad-hoc test, the start/complete steps of central login take me 300ms each, and the refreshCookies step takes 200ms. No idea how representative these numbers are.

We could change the final step of central autologin to be two top-level redirects instead of one subresource request

So that would add a refresh step that's between 200-300ms (I assume a top-level redirect is a bit slower than a subresource request because the browser has to unload the page etc), and a fast final step that just redirects the user back and otherwise doesn't do anything (~200ms? pretty much any index.php request takes at least that much for me), so maybe an extra half second during login.

This seems worthwhile, if it means we have improved central session reliability.

This seems worthwhile, if it means we have improved central session reliability.

The reliability of the central session itself would be improved. The reliability of accessing the central session wouldn't (ie. the same browsers which currently fail to do the /refreshCookies step would still fail to do autologin) so I think without T355280: Try to connect to central session before temp user creation it wouldn't help temp users much.

(Top-level autologin would keep working for a year, rather than a day like now; but for top-level autologin you need to click on the login link. Normal users will do that when they see they aren't logged in; temp users probably wouldn't.)

DAlangi_WMF changed the task status from Open to In Progress.Dec 16 2024, 1:27 PM
DAlangi_WMF claimed this task.
DAlangi_WMF moved this task from Next to In progress on the MediaWiki-Platform-Team board.

Change #1104619 had a related patch set uploaded (by Gergő Tisza; author: Derick Alangi):

[mediawiki/extensions/CentralAuth@master] CentralAuthSessionProvider: Fix race condition in central login

https://gerrit.wikimedia.org/r/1104619

Change #1104619 abandoned by D3r1ck01:

[mediawiki/extensions/CentralAuth@master] CentralAuthSessionProvider: Fix race condition in central login

Reason:

MediaWiki CLI's `mysql-replica` service (even with containers not running - stopped) seems to have caused this someone. Destroying the service entirely seems to have resolved this issue for me locally.

https://gerrit.wikimedia.org/r/1104619

DAlangi_WMF changed the task status from In Progress to Open.Jan 7 2025, 3:56 PM
DAlangi_WMF removed DAlangi_WMF as the assignee of this task.

I'll create a separate task about scheduling edge-login for temp accounts at least for SUL3 mode.