Page MenuHomePhabricator

Investigate the First-Party Sets / Related Website Sets browser API
Closed, ResolvedPublic

Description

First-Party Sets (later renamed to Related Website Sets; spec, WICG explainer, chromedev explainer and developer guide, demo) is Google's proposed solution to anti-tracking features breaking certain cross-domain features on some sites. It allows a set of domains to indicate that they have the same owner, which browsers might use to relax anti-tracking protections between those sites. We should investigate it as a potential method for keeping CentralAuth working.


Summary as of 2023 September: only Chrome has plans to support the proposed standard in its current form. It has very limited utility for us (it's almost useful, but they limit sets to max three user-visible eTLD+1 domains, and we have over a dozen of them). We should probably reach out and make our use cases known, though.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

A First-Party Set is a collection of domains, exactly one of which is a "set primary"; the rest are either "service domains" or "associated domains" (or "ccTLDs", such as wikipedia.org vs. wikipedia.de; this category is not relevant to us). Domains owners need to apply by submitting a pull request for first_party_sets.JSON in GoogleChrome/first-party-sets (which is suspiciously sparse currently), with a turnaround time of 3-4 weeks. Per the submission guidelines, the first-party set must adhere to the following requirements:

  • Service domains must have a common owner with the set primary, and should not be the entry point to a user's journey on a site. Submitters must provide an explanation of how each service domain supports functionality or security needs.
  • Service domains must have X-Robots-Tag: noindex or none, and their top-level page must redirect to a different domain or return a 4xx or 5xx.
  • For associated domains, their affiliation with the set primary must be clearly presented to users. Submitters must provide an explanation of why users would expect their domains to be affiliated (e.g., an About page, header or footer, shared branding or logo, etc).
  • While one can list any number of associated domains, only the first three in the list can be interacted with.

requestStorageAccess() and requestStorageAccessFor() can then be used to get access to cookies (but only ones marked SameSite=None and Secure, and coming with appropriate CORS headers). For associated domains, the request will be auto-granted for the first three associated domains in the list, and auto-rejected for the rest. Non-service domains using requestStorageAccessFor() to access cookies on service domains will also be auto-granted.


I think we meet the guidelines - the set primary would be wikimediafoundation.org (being the only domain that all others are clearly affiliated with, via a link in the footer), and all other wikis would be affiliated domains. (Conceptually loginwiki is more of a service domain, but doesn't meet the requirements about the top-level page.) If we have non-wiki domains used for e.g. analytics, we can add those as service domains.

How much does it help us? First-party sets will be probably supported by Chrome (chromestatus; shipped in 115 but behind a feature flag for now, on hold because of some issues). Other browser vendors are opposed at the moment (Mozilla, Apple, Brave). There are two ways to use them: embed other.domain in current.domain as an iframe and call requestStorageAccess() from the iframe; or embed other.domain in current.domain by whatever means, and call requestStorageAccessFor('other.domain') at the top-level. requestStorageAccessFor is Chrome-only as well (chromestatus; live); requestStorageAccess has semi-decent browser support (caniuse) but not very useful without first-party sets (needs to be initiated by user gesture, shows an approval popup, permission expires after a few weeks).

So given that Chrome doesn't restrict third-party cookies yet and the other browsers don't support first-party sets, it isn't useful right now. For CentralAuth, once Chrome starts blocking third-party cookies, we could probably submit all our wikis as a third-party set, and precede autologin with a requestStorageAccessFor( 'login.wikimedia.org' ) call. It wouldn't save edge login (too many domains). For analytics purposes (e.g. unique device counting across all Wikimedia properties), a service domain would probably work.


The data submission would look something like this:

{
  "sets": [
    // ...sets by other orgs...
    {
      "contact": "info@wikimedia.org"
      "primary": "https://wikimediafoundation.org/",
      "associatedSites": ["https://login.wikimedia.org", "https://wikipedia.org", "https://wikidata.org", "https://commons.wikimedia.org", "https://meta.wikimedia.org" "https://wiktionary.org", ...],
      "rationaleBySite": {
        "https://wikipedia.org": "Has a Wikimedia Foundation logo and link in the footer",
        // ...repeat many times...
      }
    },
    // ...sets by other orgs...
  ]
}

We would have to put essentially the same data structre to /.well-known/first-party-set.json on wikimediafoundation.org, and { "primary": "https://wikimediafoundation.org" } on all other involved domains.

While one can list any number of associated domains, only the first three in the list can be interacted with.

A closer reading of the WICG spec suggests associated domains are more restricted than that. The rules for resolving requestStorageAccess[For] are laid out in the eligible for same-party membership when embedded within algorithm, and they basically say that both the top-level domain and the embedded domain must be in the first three items of the associated domain list (in effect, a first-party set can only define up to three associated domains). I think that makes first-party sets, as defined today, fairly useless for us.

(But, given that the First-Party Sets spec authors are asking for feedback, including specifically on the three-domain limit, we might want to engage with them.)

Firefox has a sort of first-party sets list of its own: the disconnect.me entity list. It doesn't seem to do anything useful though. T276739: Get Wikimedia added to disconnect.me entity list has more background.

Tgr renamed this task from Investigate the First-Party Sets browser API to Investigate the First-Party Sets / Related Website Sets browser API.Apr 16 2024, 3:08 PM
Tgr updated the task description. (Show Details)

As a reference: First-Party sets may be not that useful for some 3rd party wikifarms, so we need also find another solution for them.

The explainer says "If your RPs are SameParty, you may be better served by First-Party Sets." Which would be nice if there actually was a first-party sets standard, but it doesn't seem to be making much progress.

First-Party sets would also be useless to Miraheze as custom domains could take weeks to be added and need manual work. It looks like sets have to be reviewed and shipped in a chrome profile.

First-Party sets would also be useless to Miraheze as custom domains could take weeks to be added and need manual work. It looks like sets have to be reviewed and shipped in a chrome profile.

They have their own update mechanism (so not dependent on Chrome updates) but yeah there is a manual review process which is slow. But you don't need to add all your domains to the set (it has a limit of five domains anyway), just the login domain.

As a reference: First-Party sets may be not that useful for some 3rd party wikifarms, so we need also find another solution for them.

In general, CentralAuth is a Wikimedia-specific extension and we don't support third-party use (we'll try not to break it unnecessarily, but we won't go out of our way to build alternative mechanisms for sites which differ in some relevant way from the Wikimedia cluster).
Related Website Sets is (at the moment, at least) Chrome-only though, so of course there needs to be some fallback mechanism.

I think the above counts as an investigation.

RWS seems almost useful, but the five-domain limitation and the way it's integrated with the Storage Access API makes it not actually useful. Those are relatively minor details so maybe worth checking from time to time for changes (e.g. if the upcoming HTTP headers support for the Storage Access API gets integrated). But for now, I don't think there's anything actionable left here.