Page MenuHomePhabricator

Get Wikimedia added to disconnect.me entity list
Closed, DeclinedPublic

Description

As I wrote at https://bugzilla.mozilla.org/show_bug.cgi?id=1696095#c6 , one option for resolving the parent task, at least for Firefox, is to get ourselves on the disconnect.me list of co-owned domains and to have Firefox respect it when doing first-party state partitioning.

Change requests can be sent to https://github.com/disconnectme/disconnect-tracking-protection/issues and are sometimes acted upon. To file a request, I need a canonical list of registrable domains that we are using. Judging by the existing entity list, they are mostly looking for domains that are actually used rather than parked domains.

Following is the list of files in operations/dns/zones that look like domain names and are not a symlink to parking or ncredir-parking:

mediawiki.org
wikibooks.org
wikidata.org
wikimedia.cloud
wikimediacloud.org
wikimedia.community
wikimediafoundation.org
wikimedia.org
wikinews.org
wikipedia.org
wikiquote.org
wikisource.org
wikiversity.org
wikivoyage.org
wikiworkshop.org
wiktionary.org
wmftest.com
wmftest.net
wmftest.org
wmfusercontent.org
w.wiki

Can someone please review this list and the method I used? It might not be easy to change it after I file the task.

Event Timeline

What's the criteria for inclusion (or exclusion, beyond just being a symlink etc)?

There's a couple in there like wikimediacloud.org and wikimedia.cloud which I'm guessing are Cloud-Services related domains (but conversely, toolforge isn't being listed, for example).

And then wikimedia.community which is just a 301 to www.wikimedia.org

I think the criteria should be:

  • Should a cross-domain request from origin domain X to Wikimedia be sent with Wikimedia cookies?
  • Or vice versa? Should a cross-domain request from a user script on a wiki to domain X be sent with domain X's cookies?

toolforge.org should probably be included. It's not on my list because it has its own authoritative DNS. If we're worried about Cloud Services being used to host unauthorized trackers for wiki readers then maybe those domains should be left off the list. I'm not sure how much concern there is about that kind of abuse on Toolforge. The worst case scenario is not very bad.

wmftest is used for Vagrant and similar local setups; it's basically just a fancy alias for localhost, to enable Host header based wiki matching. So it isn't really owned by us and should not be part of the list.

wmfusercontent.org is used for serving untrusted content (currently, Phabricator uploads) while take advantage of cross-domain protections. It does not need cross-domain cookies today. I guess some hypothetical future use could involve private content with cookies needed for authentication, but it seems unlikely.

w.wiki is an URL shortener. Does a top-level request to w.wiki, which then gets redirected to e.g. wikipedia.org, count as a cross-domain request? If cookies were stripped from that, that would be disruptive. (The current Firefox feature probably wouldn't do that since the request is triggered by user interaction.)

wikiworkshop.org seems like a one-off static website for some conference, it can probably be omitted.

wikimedia.community is a redirect to wikimedia.org which is a static domain, so no cookies needed. It can probably be omitted.

The remainder are either standard project domains which need to be in the equivalence set for SUL to work, or Cloud domains, for which the same consideration applies as for Toolforge (although I think wikimedia.cloud is supposed to be internal?). There's also wmflabs.org, although the plan is to move away from it; and https://wikitech-static.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/DNS_domain_usage mentions wmcloud.org as the wmflabs.org replacement.

I think the criteria should be:

  • Should a cross-domain request from origin domain X to Wikimedia be sent with Wikimedia cookies?
  • Or vice versa? Should a cross-domain request from a user script on a wiki to domain X be sent with domain X's cookies?

toolforge.org should probably be included. It's not on my list because it has its own authoritative DNS. If we're worried about Cloud Services being used to host unauthorized trackers for wiki readers then maybe those domains should be left off the list. I'm not sure how much concern there is about that kind of abuse on Toolforge. The worst case scenario is not very bad.

On toolforge.org, WMCS controls the front proxy and TLS, so that the domain can be reasonably assured to be Toolforge-hosted and so that we can strip out information and do blocking and so forth. That doesn't say much about cookie handling and so such by a hosted application. We can shut down anyone who violates the terms of use (https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use), but there is obviously some risk that someone can violate it before it comes to WMCS attention. However, there is at least some assurance with that domain that we control termination and that bad actors can be stopped. I hope that helps the discussion.

wmcloud.org can be managed entirely by the tenant or by WMCS if they are using the supported proxy frontend. There's less technical restriction on that domain, but it is still bound by the TOU. wikimedia.cloud is internal (so no worries) as is wikimediacloud.org for now. The latter could end up somewhat more public in usage in the future for certain services, but there are no concrete plans to do that now.

wmflabs.org is still in use (Quarry, for instance), though deprecated, and has the exact same level of trust and so forth as wmcloud.org.

Looking up the disconnect.me info, I know that nearly all authentication to cloud apps is done via wiki oauth of one kind or another. Does that require being considered a co-owned domain?

Looking up the disconnect.me info, I know that nearly all authentication to cloud apps is done via wiki oauth of one kind or another. Does that require being considered a co-owned domain?

OAuth should not require any cookie communication in either direction. I honestly cannot think of any reason that a tool on Toolforge should receive a cookie from *.wikimedia.org or vice versa.

OAuth should not require any cookie communication in either direction. I honestly cannot think of any reason that a tool on Toolforge should receive a cookie from *.wikimedia.org or vice versa.

The OAuth approval workflow does rely on cookies on the Wikimedia side, but that happens in the top-level browsing context so AIUI it wouldn't be affected.

I vaguely recall a discussion about how to make AJAX requests from a Wikimedia gadget to a Toolforge tool so that the tool can verify the user's identity; logging in on the tool domain via OAuth and then relying on that session cookie was one solution. That would break if toolforge and the project wikis are separate identity sets. It's not a common requirement though, and it could probably be handled in different ways.

Also the argument in the upstream bug report was that there's no privacy boundary between the various Wikimedia servers, and that would be hard to argue for Toolforge or Cloud VPS.

Let's leave this until we have some sort of confirmation that it would actually help us. I reviewed the Disconnect Firefox extension source code -- entities.json doesn't actually seem to be used there. But it looks like everything in entities.json has a corresponding entry in services.json, and everything in services.json is categorized as some sort of tracker. So the reason we're not in entities.json is because we're not considered to be a tracker. Probably adding Wikimedia would give people the option to block Wikimedia in the extension configuration. There are a number of open or closed bug reports against the list from companies asking to be removed, and from users asking for trackers to be added, but nobody is asking for non-trackers to be added to the list. So, maybe it is harmful.

Per https://github.com/mozilla-services/shavar-prod-lists#disconnect-entitylistjson:

[Extended Tracking Protection] classifies a resource as a tracking resource when it is present on blocklist and loaded as a third-party. The Entity list is used to allow third-party subresources that are wholly owned by the same company that owns the top-level website that the user is visiting. For example, if abcd.com owns efgh.com and efgh.com is on the blocklist, it will not be blocked on abcd.com. Instead, efgh.com will be treated as first party on abcd.com, since the same company owns both. But since efgh.com is on the blocklist it will be blocked on other third-party domains that are not all owned by the same parent company.

So it's basically a tracker exemption list, only relevant for tracker companies to make sure their own websites do not break when blacklisted.