Page MenuHomePhabricator

Mitigate phase-out of third-party cookies in CentralAuth
Open, Needs TriagePublic

Description

tl;dr CentralAuth autologin (the ability to log in on one wiki and be also logged in everywhere else without having to enter credentials again on every site) has been significantly degraded to a non-trivial fraction of our users due to browsers' anti-tracking measures, and will probably be degraded for everyone by 2024 summer.

Problem description

See T345245: Mitigate phase-out of third-party cookies across MediaWiki in production for context and links to relevant documentation.

CentralAuth uses two session mechanism: cookies set on the shared domain of the wiki family (e.g. on wikipedia.org in the case of en.wikipedia.org) and cookies set on a designated central domain (login.wikimedia.org). The first mechanism is considered as a first-party cookie, and not affected. The second mechanism is considered as a third-party cookie, and is at risk of breaking.

There used to be three workflows using the central domain:

  • After login, the user is redirected to the central domain and back, to invisibly set a session cookie on the central domain.
  • After login, a set of images is included in the page, one for each shared domain other than the current one (e.g. if you log in on en.wikipedia.org, the landing page will embed an image from meta.wikimedia.org, en.wikisource.org, en.wiktionary.org etc.), and each one will go through a redirect cycle of several steps which establishes the user's identity by going to the central domain, then returns to the initial domain and sets session cookies there, essentially logging in the user on all those domains in the background. (This is referred to as "edge login".)
  • If you arrive to a wiki where you are not logged in (either because it's too insignificant to be included in edge login, such as outreach.wikimedia.org, or because edge login didn't work, expired etc), a similar redirect chain happens via a <script> tag; after a successful login it might reload the page, or suggest the user via a notice to reload, or just kind-of simulate a reload by replacing the user menu.
  • A fourth mechanism was added by T326281: Attempt top-level central autologin when visiting the login page (to allow autologin when the browser blocks third-party cookies): when you are not logged in and visit the login page, the same redirect chain is attempted at the top level.

The first and fourth flow are what browsers call "bounce tracking" or "redirect tracking"; it's mostly allowed today, although definitely within the sight of browser vendors. (It might cause issues in Safari, which does sometimes drastically reduce cookie lifetimes when it detects bounce tracking.) The other two are probably failing in Safari and Firefox, and will fail in Chrome (which is the majority of our traffic) starting mid-2024. (Although they have pushed back the deadline by multiple years already, so there is always the chance of that happening again.)

User impact

Users will have to log in once per "domain family" (wikipedia.org etc), instead of just once. For most users this will probably mean three separate logins: Wikipedia, Commons and Wikidata. (This can be sometimes mitigated by the centralauthtoken API, although it's cumbersome to use.) With a Wikimedia login with the "remember me" option checked lasting for a year, this is a small annoyance for most normal users. Temp users would be cut off from other projects entirely, though - autologin is the only login mechanism for them.

Mitigation options

  • Accept the UX degradation and don't do anything.
  • Accept the UX degradation and don't do anything in general, but improve the most problematic special cases.
    • Turn the centralauthtoken API into something more flexible (e.g. token reusable in multiple requests, less strict time limits).
    • Provide an explicit login mechanism for temp users, maybe something like Facebook's "tap the notification in another device where you are logged in" feature. (With third-party cookie restrictions, there would be no way to detect whether the user is logging in from same device, so this would also mean temp users could be logged in in multiple devices - not sure if that's a good thing or a bad thing.)
  • T335851: Investigate the Federated Credential Management browser API
  • T345589: Investigate the First-Party Sets / Related Website Sets browser API
  • Use popups instead of embedded resources, with requestStorageAccess() which on some browsers (Firefox, at least) have more lax heuristics.
  • Replace the current cross-domain authentication flows with something that involves user interaction with the central domain (i.e. users would have to click on the login link, and then maybe click through an interstitial, but wouldn't have to enter their credentials again). Possibly do all that in a popup. There are a number of unrelated reason why we might want to do this. We could use a more standard indentity provider protocol, such as OpenID Connect.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenTgr
OpenTgr
StalledDAlangi_WMF
OpenArielGlenn
ResolvedTgr
OpenNone
OpenNone
OpenNone
OpenNone
Openmatmarex
OpenDAlangi_WMF
OpenNone
OpenDAlangi_WMF
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedTgr
Duplicatepmiazga
ResolvedTgr
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Replace the current cross-domain authentication flows with something that involves user interaction with the central domain (i.e. users would have to click on the login link, and then maybe click through an interstitial, but wouldn't have to enter their credentials again). Possibly do all that in a popup. There are a number of unrelated reason why we might want to do this.

There are a number of unrelated issues with the current cross-domain login process that we could take the opportunity to fix, if we need to make significant changes to the code anyway because of third-party cookie deprecation:

We could probably solve most or all of these by replacing the current workflows with a single dedicated login domain which is always used for login, uses a more standard cross-domain communication process (e.g. OAuth) and is locked down to just a login / signup interface, instead of being a fullblown MediaWiki.

Tested by logging in on en.wikipedia.org, then visiting www.mediawiki.org (without "Keep me logged in" option):

Instant (<script>) autologinSpecial:UserLogin (top-level) autologin
Chrome 116 / Win 11works
Chrome 116 / Win 11 & Ubuntu (incognito mode)failsworks
Brave 1.57.62 / Win 11failsworks
Firefox 117 / Win 11 & Ubuntufailsworks
Firefox 117 / Win 11 & Ubuntu (private mode)failsworks
Edge 116 / Win 11works
Edge 116 / Win 11 (InPrivate window)works
Edge 116 / Win 11 (with "Block third-party cookies" option)failsworks
Opera 102 / Win 11works
Opera 102 / Win 11 (private mode)failsworks
Epiphany 44.6 / Ubuntu (as an approximation of Safari 16.4)failsworks
Epiphany 44.6 / Ubuntu (private mode)failsworks

Copying over some relevant comments from other tasks (most of which I am merging into this one):

That said, what they call Network Partitioning is enabled for everyone, and might have performance implications - details are scarce, but it seems to suggest e.g. DNS cache and HTTP cache would be split by referrer domain.

(See also Chrome status, where network partitioning is in origin trial.)
cc @Krinkle - not sure how significant this is for the client-side performance of Wikimedia sites.

("Wikipedia SSO auto-login broken by state partitioning")

We've also had issues at Miraheze now ETP is rolled out. I opened https://bugzilla.mozilla.org/show_bug.cgi?id=1774861 last night for Miraheze.

("Unable to login on miraheze wikis (mediawiki 1.38) with ETP - Standard Enabled")

[...] cross-domain SameSite handling is identical in Chrome with third-party cookie-blocking on or off (or at least it was as of 2020 July): T257803#6315123 [...] That means login problems affected by cookie blocking user preferences are not SameSite related.

(ref: T252236: Prepare CentralAuth (e.g. login.wikimedia.org) for requirement of SameSite=None cross-site cookies in Chrome)

Found this nice resource on storage access technical details, if anyone ever needs to work with that: https://github.com/rchild-okta/itp#storage-access

Tgr added subscribers: TonyBallioni, Legoktm.

Can we start sketching out what SUL3 would look like? In my head I currently have an idea that the login link will send you to login.wikimedia.org with ?uselang and original URL preserved, then you log in, including 2FA, it redirects you back to where you came from with an OAuth style handshake. Then on another wiki family, you have to click login, but once you hit login.wikimedia.org it immediately redirects you back and you're now logged in. I'm not sure if there's a way to transparently do the wiki logins without redirecting back and forth...

The next step [after T326281] would be to disable local login entirely on every wiki that is not the central login wiki. We could either keep triggering the central autologin logic from SpecialUserLogin, or make CentralAuthPrimaryAuthenticationProvider return a REDIRECT reponse instead of trying to do password-based authentication (AuthManager already knows to automatically redirect instead of showing the login page if there is only one primary auth provider and it redirects), and manipulate the last leg of autologin so that it redirects back into the normal login process. Still seems fairly easy to do, but all clients which provide their own login UI (e.g. the WMF apps) would have to update their code, and loginwiki would have to be prepared to a user-facing wiki.

I think the end goal should be for loginwiki to be an OIDC provider (standards are nice, makes it easier to understand and test, makes it easier to replace CentralAuth with something else), which it sort of already is via the OAuth extension (T254063: OAuth extension should support OpenID Connect) but we'd probably want CentralAuth to hook into OAuth and suppress the authorization dialog. The login page would then just perform an OIDC handshake and CentralAuthSessionProvider would probably store the OIDC bearer token in a cookie, instead of the current set of cookies.

It would be good to incorporate T248339: Decide how to deal with WebAuthn login/registration flow on Wikimedia wikis in future in any SUL3 plans as well, as another auth problem caused by having multiple login domains.

Legoktm already told me on IRC the proposed solution here wouldn't start logging logins on loginwiki, but worth pointing out in case anyone thought there should be a CU impact: loginwiki was intentionally excluded from the implementation of tracking logins in the CU table when we made that change a few years ago because it would effectively create global CU. While this has been a steward wishlist item for a while, it doesn't have global consensus and is controversial/would probably need a global RfC to expand the steward mandate. For those unaware, loginwiki currently only tracks account creations in CU, and it was intentionally left that way when other wikis started to track logins.

I belatedly realized that the current extent of browser limitations doesn't explain why edge login doesn't work in Chrome (and presumably Edge and Opera, although I haven't tested those) and so that might be an easy fix. Tracking that in T347889: Investigate why CentralAuth edge login fails in browsers that do not block third-party cookies.

Filed T348388: Use central login wiki for login (SUL3) for the likely next step of doing login at a single dedicated domain, which might or might not be necessary to mitigate third-party cookie blocking (in the unlikely case that the First-Party Sets spec gets adopted, we could probably keep the current system) but has so many security and usability benefits that it can be justified even if we assume CentralAuth in its current form keeps working indefinitely.

Chrome will disable third-party cookie access for 1% of users starting January 4th, 2024, ie. Thursday next week. (They plan to ramp up to 100% in Q3 2024; hopefully we have migrated away from our current login system by then.) Given Chrome is something like 80% of our userbase, there will probably be some bugs filed about this. In the shorter term, we can't do much about it (and it has already been the situation for a while for Firefox and Safari users), just noting.

Tested by logging in on en.wikipedia.org, then visiting www.mediawiki.org (without "Keep me logged in" option):

Instant (<script>) autologinSpecial:UserLogin (top-level) autologin
Chrome 116 / Win 11works
Chrome 116 / Win 11 & Ubuntu (incognito mode)failsworks
Brave 1.57.62 / Win 11failsworks
Firefox 117 / Win 11 & Ubuntufailsworks
Firefox 117 / Win 11 & Ubuntu (private mode)failsworks
Edge 116 / Win 11works
Edge 116 / Win 11 (InPrivate window)works
Edge 116 / Win 11 (with "Block third-party cookies" option)failsworks
Opera 102 / Win 11works
Opera 102 / Win 11 (private mode)failsworks
Epiphany 44.6 / Ubuntu (as an approximation of Safari 16.4)failsworks
Epiphany 44.6 / Ubuntu (private mode)failsworks

I re-tested this as a preparation for another task, and found that cookies are sent to *.wikimedia.org domains, regardless of whether third-party cookie blocking is enabled in the browser; so e.g. when logging in on en.wikipedia.org and visiting commons.wikimedia.org, I am logged in threre. Tested this on both Chrome and Firefox, both worked. Doesn't seem to happen on any other domains, not even wikipedia.org. It's also happening on *.wikimedia.beta.wmflabs.org, but not e.g. *.wiktionary.beta.wmflabs.org.

No idea what's going on. The potential explanations I can think of is that we are accidentally setting cookies on .wikimedia.org at some *.wikimedia.org wiki (doesn't seem to be happening), that I have somehow configured a local override and forgot about it (can't find any trace of that) or that these domains ended up on some exception list that both Chrome and Firefox are using (sounds plausible for wikimedia.org, but for wmflabs? ).

I guess the more likely explanation is that these browsers are using something looser than same-origin (same parent domain?) for third-party cookie blocking, so loginwiki and other wikimedia.org wikis are considered in the same security bucket.

Verified for Firefox: I logged out, deleted all *wik*org cookies except for non SUL sites (phab, etherpad and so on), set Enhanced Tracking Protection to "custom" and chose "All cross-site cookies" (see image below). After login at en.wikipedia, after the end of the redirect/session creation chain for commons.wikimedia.org, I have session cookies for commons with session id, UserId, UserName stored locally. When I go to visit commons,wm.o, these cookies are sent to the web server and I am logged in as a result.
Firefox version: 121.0, linux.

Screenshot from 2024-01-17 13-32-27.png (659×873 px, 55 KB)

Next step is to test the relevant browser APIs:

spec investigationtestbrowserfunctionality
RWST345589T359926Chromecookie access, probably on par with current
FedCMT335851T359947Chrome, Edge, Opera (maybe Firefox soon?)browser-mediated identity checks
Storage AccessT359948all moderncookie access after user interaction

Also Chrome offers a deprecation trial to delay when they apply third-party cookie blocking, and we have no reason not to make use of that (we can opt out of it later if we want) so we'll apply: T359957: Enroll in Chrome third-party cookies deprecation trial

@Tgr Do we need a test for the top-level redirect chain as well? (i.e. OAuth-like, but using our existing code for the most part)

It sounds most like T359948: Test cross-domain cookie access with Storage Access API but there it says:

Maybe also check if we can avoid permission popups by relying on heuristics-based cookie blocking exceptions. […] But given that those exceptions are temporary, this might not be worth the effort.

Have any of Google, Apple or Mozilla announced an intent to deprecate or break OAuth-like redirects? This approach seems to me the most sustainable long-term. It would rely on proven technology, require no special vendor permission, and would show a UX that is least novel/surprising compared to other websites. I.e. the idea that you ocasionally click "Log in" on an affiliated site and choose your identity on loginwiki and bounce back.

I worry about introducing permission prompts into the UX, in particular the wording they might use and how that reflects on Wikipedia, at a time that virtually no major consumer sites use any of these three approaches (afaik).

Have any of Google, Apple or Mozilla announced an intent to deprecate or break OAuth-like redirects?

I think for Google at least the intent is quite clear. (The other two in general communicate much less about this topic; Google has a whole website dedicated to its privacy plans.) Their recommended replacements for authentication are RWS, FedCM and using a single domain.

OAuth-like experiences using anything other than a top-level redirect fall under their heuristics based exceptions policy, which they very clearly present as a temporary option. Top-level redirects fall under their bounce tracking mitigations policy which is permanent (or at least I haven't seen it described as temporary so far).

Do we need a test for the top-level redirect chain as well?

You are right that this is another potentially viable option (besides RWS, FedCM, the Storage API, and - for a while - heuristics based exceptions). I'm not sure what exactly we should be testing, though. We know it works (to some extent) - we are already using it in production. Bounce tracking mitigations were only rolled out in October 2023 (and then only for users with third-party cookie blocking, which had to be enabled manually until recently), so maybe it's working less now and we didn't notice? I guess that's worth checking. But we can do that in production, I don't think a test site would have much point.

(An extra source of uncertainty is that there is overlap between the heuristics based exceptions and bounce tracking, so I don't think we have a reliable way to tell what will happen once those exceptions go away.)

at a time that virtually no major consumer sites use any of these three approaches

Google's "Sign in with Google" popup will transition to FedCM soon, that's used by lots of largish sites (Medium for example).

We could have a look at what other SSO services say publicly about how they will handle the Privacy Sandbox changes.

This approach [OAuth-like redirects] seems to me the most sustainable long-term.

I suppose that depends on your definition of sustainable.
RWS would probably give us more power than any other option, although only on certain browsers. FedCM would probably take the minimal amount of effort in the long term, but as you point out it's at the cost of some loss of control.

I don't think a test site would have much point.

On second thought, one thing we might want to test is how redirects behave within programmatically opened popups. Top-level redirects in the main browser window are hard to use because they disrupt client-side state. Opening a popup and doing the redirect there is much more versatile, but I'm not 100% sure that would still be considered "top-level".

I'll file a task about that.

With third-party cookie blocking enabled, after a central login, Chrome's Issues tab says this:

Chrome may soon delete state for intermediate websites in a recent navigation chain
In a recent navigation chain, one or more websites accessed some form of local storage without prior user interaction. If these websites don't get such an interaction soon, Chrome will delete their state.
1 potentially tracking website: wikimedia.org
Learn more: Bounce tracking mitigations

which suggests central login and top-level autologin wouldn't survive the rollout of cookie blocking, either. I haven't reviewed the linked spec in full, but I think the relevant part is #bounce-tracking-mitigations-timers which basically says if a domain has not received user interaction in the last 45 days and it does not receive user interaction within 1 hour of the bounce tracking (ie. doing the central login redirect chain), all cookies, cache and other stored data for login.wikimedia.org will get scrubbed. This is much more aggressive than e.g. the bounce tracking mitigations used by Firefox and would render central login (in its current form) entirely useless.