Page MenuHomePhabricator

Enable $wgCrossSiteAJAXdomains for wikimedia sites
Closed, ResolvedPublic

Description

Enable $wgCrossSiteAJAXdomains for wikimedia sites. It would be useful to be able to access the api across wikimedia domains through js gadgets.

Setting it to something like
$wgCrossSiteAJAXdomains = array( '/http:\/\/[a-z\-]{2,}\.wikipedia\.org/', '/http:\/\/[a-z\-]{2,}\.wikinews\.org/', '/http:\/\/[a-z\-]{2,}\.wiktionary\.org/', '/http:\/\/[a-z\-]{2,}\.wikibooks\.org/', '/http:\/\/[a-z\-]{2,}\.wikiversity\.org/', '/http:\/\/[a-z\-]{2,}\.wikipedia\.org/', '/http:\/\/[a-z\-]{2,}\.wikisource\.org/', '/http:\/\/[a-z\-]{2,}\.wikiquote\.org/', '/http:\/\/(?!upload)[a-z\-]{2,}\.wikimedia\.org/' );

Note: you might want to check the last one. I assume allowing cross site access to upload.wikimedia.org = Bad Thing. I don't know if allowing access to anything.wikimedia.org but upload.wikimedia.org is ok.

Cheers,
Bawolff


Version: unspecified
Severity: enhancement

Details

Reference
bz20814

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:49 PM
bzimport set Reference to bz20814.
bzimport added a subscriber: Unknown Object (MLST).
Bawolff created this task.Sep 25 2009, 8:24 PM

We want to be more restrictive for *.wikimedia.org , because there's a bunch of untrusted subdomains in there. We should explicitly list the ones we own.

Note: I found another way to do what i wanted without this enabled ( http://en.wiktionary.org/w/api.php?action=parse&prop=text&page=Wikimedia&format=xml&xslt=MediaWiki:extractFirst.xsl ), so i don't really need it . But it would probably still be useful to have it enabled .

aokomoriuta wrote:

How about on only main wikis?
I mean, for wn, wikt, wb, wv, wp, ws, wq, and commons and meta.

$wgCrossSiteAJAXdomains = array(
'/http:\/\/[a-z\-]{2,}\.wikinews\.org/',
'/http:\/\/[a-z\-]{2,}\.wiktionary\.org/',
'/http:\/\/[a-z\-]{2,}\.wikibooks\.org/',
'/http:\/\/[a-z\-]{2,}\.wikiversity\.org/',
'/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
'/http:\/\/[a-z\-]{2,}\.wikisource\.org/',
'/http:\/\/[a-z\-]{2,}\.wikiquote\.org/',
'/http:\/\/(?!upload)(commons|meta){2,}\.wikimedia\.org/' );

Also dont forget the secure subdomain. The better scripts dont link to the domain but use wgServer/wgScript.

Also dont forget the secure subdomain. The better scripts dont link to the
domain but use wgServer/wgScript.

Such as https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page

This would break squid caching. I don't see a "Vary: Origin" header, so whichever subdomain requests a given cacheable object first will have an Access-Control-Allow-Origin header sent back with the origin subdomain in it. The header will be cached, so subsequent requests from different domains will be denied by the client.

Vary:Origin would be a disaster for caching anyway, since there's hundreds of internal domains, and external domains could potentially send this header also.

As for the code in api.php: the Origin header is a whitespace-separated list of origins. Running an unanchored case-sensitive regex against the whole string is not appropriate. Section 5.1 of the July 2010 CORS spec gives the correct algorithm:

http://www.w3.org/TR/2010/WD-cors-20100727/#resource-requests

One possible way to support CORS would be to require that the origin be specified in a URL parameter. If the URL parameter matches the Origin header, then the access control header can be sent with Vary: Origin. If it doesn't match, a 403 can be sent with CC: no-cache. If the URL parameter is missing, no Vary header or access control header is sent. This means that caching will only be broken to the extent necessary to support the feature.

Another way to do it would be to implement the whole feature in Squid. A custom response header from MediaWiki, similar to X-Vary-Options, would specify the complete list of allowable domains. Then Squid would handle setting the correct access control headers in a post-cache step.

Just (In reply to comment #0)

Setting it to something like
$wgCrossSiteAJAXdomains = array( '/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
'/http:\/\/[a-z\-]{2,}\.wikinews\.org/',
'/http:\/\/[a-z\-]{2,}\.wiktionary\.org/',
'/http:\/\/[a-z\-]{2,}\.wikibooks\.org/',
'/http:\/\/[a-z\-]{2,}\.wikiversity\.org/',
'/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
[..]

Just for the record, not all subdomains are 2 characters. There's longer ones as well (nds, be-x-old, etc.) Although *.wikimedia.org is a problem, I think * is fine for the sisterprojects, right ? Atleast longer than {2}

(In reply to comment #7)

Just (In reply to comment #0)

Setting it to something like
$wgCrossSiteAJAXdomains = array( '/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
'/http:\/\/[a-z\-]{2,}\.wikinews\.org/',
'/http:\/\/[a-z\-]{2,}\.wiktionary\.org/',
'/http:\/\/[a-z\-]{2,}\.wikibooks\.org/',
'/http:\/\/[a-z\-]{2,}\.wikiversity\.org/',
'/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
[..]

Just for the record, not all subdomains are 2 characters. There's longer ones
as well (nds, be-x-old, etc.) Although *.wikimedia.org is a problem, I think *
is fine for the sisterprojects, right ? Atleast longer than {2}

{2,} means 2 or more characters, so be-x-old would be fine.

(In reply to comment #8)

(In reply to comment #7)

Just (In reply to comment #0)

Setting it to something like
$wgCrossSiteAJAXdomains = array( '/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
'/http:\/\/[a-z\-]{2,}\.wikinews\.org/',
'/http:\/\/[a-z\-]{2,}\.wiktionary\.org/',
'/http:\/\/[a-z\-]{2,}\.wikibooks\.org/',
'/http:\/\/[a-z\-]{2,}\.wikiversity\.org/',
'/http:\/\/[a-z\-]{2,}\.wikipedia\.org/',
[..]

Just for the record, not all subdomains are 2 characters. There's longer ones
as well (nds, be-x-old, etc.) Although *.wikimedia.org is a problem, I think *
is fine for the sisterprojects, right ? Atleast longer than {2}

{2,} means 2 or more characters, so be-x-old would be fine.

Sorry my bad. Why this restriction though ? What about ajax-niftyness in a future version of m.wikipedia.org ? I'm just a little unsure why there's limit/minimum put in there.

Honestly, it was such a long time ago I posted comment 0, I can't remember if there was any reason for that, or if it was just an automatic, lang codes are at least 2 letters type thing.

jeluf wrote:

According to Tim's comment, this is not just a configuration request but requires coding first => removed "shell" keyword

  • Bug 30802 has been marked as a duplicate of this bug. ***

(In reply to comment #6)

One possible way to support CORS would be to require that the origin be
specified in a URL parameter. If the URL parameter matches the Origin header,
then the access control header can be sent with Vary: Origin. If it doesn't
match, a 403 can be sent with CC: no-cache. If the URL parameter is missing, no
Vary header or access control header is sent. This means that caching will only
be broken to the extent necessary to support the feature.

That's what I ended up doing, and I also fixed the Origin-header-can-contain-spaces issue.

The bulk of the changes are in https://gerrit.wikimedia.org/r/9624 . There are three smaller changes leading up to it as well; you can view them all at https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+branch:master+topic:apicors,n,z

If this passes muster, we can enable CORS on the live site once these changes are deployed.

(In reply to comment #13)

If this passes muster, we can enable CORS on the live site once these changes
are deployed.

It seems these changes have now been deployed, so next Tuesday I'll take a stab at enabling CORS for Wikimedia domains.

(In reply to comment #14)

(In reply to comment #13)

If this passes muster, we can enable CORS on the live site once these changes
are deployed.

It seems these changes have now been deployed, so next Tuesday I'll take a stab
at enabling CORS for Wikimedia domains.

It slipped to Wednesday instead of Tuesday, but this is now done! CORS is now working for me; tested by pasting the following code snippet into the JS console on English Wikipedia:

$.ajax( {
'url': 'https://www.mediawiki.org/w/api.php',
'data': {

		'action': 'query',
		'meta': 'userinfo',
		'format': 'json',
		'origin': 'https://en.wikipedia.org'

},
'xhrFields': {

		'withCredentials': true

},
'success': function( data ) {

		alert( 'Foreign user ' + data.query.userinfo.name +
			' (ID ' + data.query.userinfo.id + ')' );

},
'dataType': 'json'
} );

He7d3r added a comment.Sep 5 2012, 6:27 PM

Should this code be working also on pt.wikipedia? (it isn't)

He7d3r added a comment.Sep 5 2012, 6:29 PM

(In reply to comment #16)

Should this code be working also on pt.wikipedia? (it isn't)

Specifically:

XMLHttpRequest cannot load https://www.mediawiki.org/w/api.php?action=query&meta=userinfo&format=json&origin=https%3A%2F%2Fen.wikipedia.org. Origin https://pt.wikipedia.org is not allowed by Access-Control-Allow-Origin.

(In reply to comment #17)

(In reply to comment #16)

Should this code be working also on pt.wikipedia? (it isn't)

Specifically:

XMLHttpRequest cannot load
https://www.mediawiki.org/w/api.php?action=query&meta=userinfo&format=json&origin=https%3A%2F%2Fen.wikipedia.org.

Origin https://pt.wikipedia.org is not allowed by Access-Control-Allow-Origin.

You have to set the origin= query parameter correctly. Your URL contained &origin=https%3A%2F%2Fen.wikipedia.org , that needs to be &origin=https%3A%2F%2Fpt.wikipedia.org instead (this corresponds to the 'origin': 'https://en.wikipedia.org' line in my snippet).

He7d3r added a comment.Sep 5 2012, 6:38 PM

(In reply to comment #15)

(In reply to comment #14)
It slipped to Wednesday instead of Tuesday, but this is now done!

For the record: it was done on gerrit change Id715c280.

(In reply to comment #18)

You have to set the origin= query parameter correctly. Your URL contained
&origin=https%3A%2F%2Fen.wikipedia.org , that needs to be
&origin=https%3A%2F%2Fpt.wikipedia.org instead (this corresponds to the
'origin': 'https://en.wikipedia.org' line in my snippet).

Got it! Sorry for the mistake.

BTW: I tried to use

'origin': mw.config.get( 'wgServer' )

which corresponds to

'origin': "//pt.wikipedia.org"

and it didn't work.

(In reply to comment #19)

(In reply to comment #15)

(In reply to comment #14)
It slipped to Wednesday instead of Tuesday, but this is now done!

For the record: it was done on gerrit change Id715c280.

Yes, I forgot to mention that. Thanks!

Got it! Sorry for the mistake.
BTW: I tried to use

'origin': mw.config.get( 'wgServer' )

which corresponds to

'origin': "//pt.wikipedia.org"

and it didn't work.

Yeah, unfortunately the origin parameter requires that the protocol be specified correctly. It seems like something like 'origin': document.location.protocol + '//' + document.location.hostname should work.

This is great news! But when I try the exact code as in comment 15, I get an empty 403 Forbidden showing in Firebug. Any idea what could be happening?

This is great news!

Just added "wikificator" gadget (search article in wikipedia and create internal links) to ru-wikisource, and it works!

Sergey

(In reply to comment #22)

Just added "wikificator" gadget (search article in wikipedia and create
internal links) to ru-wikisource, and it works!

For those interested, it is available here:
https://ru.wikisource.org/wiki/Special:PrefixIndex/MediaWiki:Gadget-wikilinker

(In reply to comment #21)

This is great news! But when I try the exact code as in comment 15, I get an
empty 403 Forbidden showing in Firebug. Any idea what could be happening?

You have to adapt the 'origin' parameter to whatever the origin domain is. I was testing on English Wikipedia using HTTPS, so my example has 'origin': 'https://en.wikipedia.org', you'll need to change that as appropriate.

Sorry, but I'm not getting it working with the origin parameter changed: http://brett-zamir.me/testCORS.html . I am in China, so don't know if network issues here could be different, but the page I just listed is returning an error alert for me (I only changed the original code for the origin and to add an errback).

TheDJ added a comment.Sep 7 2012, 6:10 AM

@Brett, that's because that server is not enabled in wgCrossSiteAJAXdomains. If it were it would be a security risk. You can only do this between, in this case wikimedia sitesss, that you are logged into.

Sorry to be so clueless here and not noticing the original comment about this--but what is the harm in providing some read-only access to other domains? JSONP is already exposed, so why is this not being exposed openly?

(In reply to comment #27)

Sorry to be so clueless here and not noticing the original comment about
this--but what is the harm in providing some read-only access to other domains?
JSONP is already exposed, so why is this not being exposed openly?

JSONP is exposed, but locked down, and uses the browser's same-origin policy as part of the protection against CSRF. It would probably be possible to implement read-only CORS from non-Wikimedia domains, but that would be scary, easy to get wrong, and would remove a layer of protection that we currently have.

For the list of whitelisted origin domains (i.e. the list of domains from which you can make cross-domain AJAX requests to a WMF wiki), see https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=blob;f=wmf-config/CommonSettings.php;h=8a8952eeeb75a6a4b7133abc8a3c536d8ba24141;hb=HEAD#l764 . All wikis accept these cross-domain requests, except private wikis (i.e. wikis where people without accounts cannot read pages).

Tgr added a comment.Sep 7 2012, 10:33 PM

(In reply to comment #24)

You have to adapt the 'origin' parameter to whatever the origin domain is. I
was testing on English Wikipedia using HTTPS, so my example has 'origin':
'https://en.wikipedia.org', you'll need to change that as appropriate.

Why is it necessary to specify the origin in the URL? Couldn't you just use the Origin: header?

(In reply to comment #29)

(In reply to comment #24)

You have to adapt the 'origin' parameter to whatever the origin domain is. I
was testing on English Wikipedia using HTTPS, so my example has 'origin':
'https://en.wikipedia.org', you'll need to change that as appropriate.

Why is it necessary to specify the origin in the URL? Couldn't you just use the
Origin: header?

It's necessary to make Squid caching continue to work. Not including the origin in the URL causes cache pollution. The origin parameter is actually validated against the Origin header too, and if they don't match, a 403 is served (with no-cache headers, of course).

(In reply to comment #27)

Sorry to be so clueless here and not noticing the original comment about
this--but what is the harm in providing some read-only access to other domains?
JSONP is already exposed, so why is this not being exposed openly?

For read-only access, use JSONP. JSONP works across any domain and is not affected by the same-origin policy because it doesn't use XHR requests, but regular script requests (through a callback parameter). The API automatically puts itself in read-only anonymous user mode when accessing it through JSONP.

For pure JSON, the origin has to be trusted and write-access is allowed. For that kind of access the origin must be trusted.

@Krinkle: Thanks, but it would really be nice to have the error checking of CORS. I presume Roan knows what he is talking about, but if it is true what you say that the "API automatically puts itself in read-only anonymous user mode when accessing it through JSONP", then wouldn't this mode just need to be switched on in the case of cross-domain CORS?

Btw, should this discussion be tracked in the likes of Bug 30802 since getting off topic here?

(In reply to comment #32)

@Krinkle: Thanks, but it would really be nice to have the error checking of
CORS. I presume Roan knows what he is talking about, but if it is true what you
say that the "API automatically puts itself in read-only anonymous user mode
when accessing it through JSONP", then wouldn't this mode just need to be
switched on in the case of cross-domain CORS?

No, not at all. That would make cross-domain CORS pretty much useless.

The API allows trusted interaction through all modes except JSONP. So when one server communicates with another server from PHP, it will be possible to authenticate and do things.

And if two web sites communicate within the browser, it is also allowed, but only when both ends trust each other. Otherwise there would be a major security leak. Just imagine what would happen if someone would embed some javascript on a site somewhere that makes an AJAX request to the API to get a token and then edit a page. If you were to visit that other website (could be from a link in a chat application, Twitter, or e-mail etc.- could even be masked by a genuine-looking redirect) then the second you visit that other wise you'd suddenly (without you knowing) be making an edit on Wikipedia. Why? Because that AJAX request was made in your browser and you're still logged in, of course.

That's why

  • JSON cross-origin requests are only allowed if both ends trust each other.
  • JSONP requests are always allowed because they are unauthenticated.

You may wonder why its not possible to cheat. The reason is that JSON (not JSONP) can only be read if the XHR allows one to read the response. And one can't make an edit without a token, which can only be send if it was received first. So just making the request is not enough, it needs to be read and then send back. That is the security model basically.

JSONP on the other hand works with a callback, which means it is unrestricted. Any function form anywhere can be named and is then invoked.

@Krinkle: Thanks, but I'm well familiar with JSONP itself, though I am not familiar with Mediawiki's implementation. I was simply suggesting that Mediawiki apply the same level of access to untrusted CORS as to JSONP. The error detection and security risk avoidance (particularly useful for non-Wikimedia sites) of CORS relative to JSONP would be a better choice, if not also for its slightly more streamlined API.