Page MenuHomePhabricator

http://wm-bot.wmflabs.org/browser/ should not load assets from external 3rd party domains
Closed, ResolvedPublic

Description

http://wm-bot.wmflabs.org/browser/ is loading assets from external web sites:

  1. fonts.googleapis.com
  2. code.jquery.com
  3. camo.githubusercontent.com

These are all violations of the Terms of Use for wmflabs.org (that Wikimedia website is usually called "Labs" or "Toolforge"), since there is no interstitial warning you that your IP address is being sent to a third party.

We should probably host these files ourselves, since many other wmflabs projects depend on them. One could download these files from these URLs, place these files somewhere in the code repository and refer to them using a relative path, so that these files are not loaded from jquery.com and other websites anymore, but from some Wikimedia server on which this PHP code lives.

Note that we have our own CDN for cloud services purposes at https://tools.wmflabs.org/cdnjs/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 26 2016, 5:10 AM
yuvipanda triaged this task as High priority.Apr 26 2016, 5:10 AM
scfc assigned this task to Petrb.Apr 26 2016, 4:00 PM

I'm confused, if the page is only getting assets, why would any data (IP or otherwise) be sent? I guess I just have never learned about this before.

bd808 added a comment.Apr 28 2016, 5:45 PM

I'm confused, if the page is only getting assets, why would any data (IP or otherwise) be sent? I guess I just have never learned about this before.

Any HTTP request to a server provides that server with the IP address of the requesting web browser. That's just how TCP/IP networking works. Additionally interacting with a server trivially provides the opportunity for Set-Cookie headers to be sent to the browser that might be used to provide additional tracking beyond the IP address and other passive device fingerprinting.

The subtle thing about these interactions is that the web browser performs the requests automatically when they are for image, css or javascipt assets. There is no opportunity (other than via browser privacy enhancements like ad blockers which may or may not be available) for the end user to decide whether or not to interact with the 3rd party server. There is additional discussion of the broad topic happening in T129936.

We've just now discussed this some more. Here's my understanding:

  1. Everyone agrees that leaking user IPs to these third party domains is bad, a potential privacy violation.
  2. In the past, such behavior has been actively discouraged by Labs admins. But,
  3. Our current Terms of Use, as written, is not explicit about whether doing this is actually a violation of the Terms of Use.
  4. Ergo, there's going to be an RFC about clarifying some of the Terms of Use.

Despite the legal case of 3) and 4), it would still be good citizenship to properly inform your users about use of these external services. And inasmuch as it's quite likely that 4) will result in an explicit banning of this behavior... today is as good as tomorrow for disclosure.

You could also just use tools.wmflabs.org/cdnjs to load the assets from instead.

Note that the ambiguity exists only for Labs projects - not for tools projects, since projects on tools abide explicitly by the Wikimedia Privacy policy.

Petrb added a comment.Oct 30 2017, 1:27 PM

I am really too busy to deal with something so trivial and unimportant, but I suppose we could turn this to GCI task and hope for some student to fix it:

What is this all about in a nutshell: there are some PHP pages, source here: https://github.com/benapetr/wikimedia-bot/tree/master/public_html/logs they are publicly visible here: (don't click if you don't want to expose your IP to google) http://wm-bot.wmflabs.org/browser/

The "privacy issue" we are having here is that anyone who access the page right now is exposing their IP address to Google's CDN (like if everyone on planet wasn't doing it already). For some people within our movement this is a massive deal. Fix is simple: change the loading of external assets from Google's CDN to our own, or simply host them locally.

Petrb lowered the priority of this task from High to Low.Nov 17 2017, 8:02 PM

Fix is simple: change the loading of external assets from Google's CDN to our own, or simply host them locally.

What is a contributor supposed to exactly do? What's "our own" here?

$:acko\> grep -r googlefonts .
./public_html/logs/index.php:googlefonts_init($html);
$:acko\> grep -r jquery .
./public_html/logs/index.php:$html->ExternalCss[] = "http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css";
./public_html/logs/index.php:$html->ExternalJs[] = "http://code.jquery.com/jquery-1.9.1.js";
./public_html/logs/index.php:$html->ExternalJs[] = "http://code.jquery.com/ui/1.10.3/jquery-ui.js";
$:acko\> grep -r githubusercontent .
$:acko\> grep -r camo .
Petrb added a comment.Nov 24 2017, 1:41 PM

I suppose that we host these files ourselves, since many other wmflabs projects depend on them. If not, then by "host them locally" I mean download them from these URLS, place them somewhere in repo and refer to them using relative path, so that they aren't loaded from jquery.com but from whatever server this PHP code lives on.

our own CDN for cloud services purposes is https://tools.wmflabs.org/cdnjs/

Aklapper renamed this task from http://wm-bot.wmflabs.org/browser/ is loading assets from multiple 3rd party domains to http://wm-bot.wmflabs.org/browser/ should not load assets from external 3rd party domains.Nov 29 2017, 1:39 PM
Aklapper updated the task description. (Show Details)
Aklapper updated the task description. (Show Details)Nov 29 2017, 1:41 PM

Pull request merged.

Framawiki closed this task as Resolved.Mar 24 2018, 10:57 PM
Framawiki reassigned this task from Petrb to eflyjason.
Framawiki added a subscriber: Petrb.