Page MenuHomePhabricator

Enable cross-domain API requests in API's JSON responses
Closed, ResolvedPublic

Description

I was hoping that the response from a GET request to Wikipedia's API[1] would include a CORS "Access-Control-Allow-Origin: *" header, so that it could be accessed by a client-side script running on any domain.

I ended up using the JSONP response as a workaround, but this is less secure than cross-origin JSON, and shouldn't really be necessary now that browsers support CORS headers.

Would it be possible to add an "Access-Control-Allow-Origin: *" header to the API's JSON responses?

[1] https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmlimit=max&cmtype=subcat&format=json&cmtitle=Category:Set_theory


Version: unspecified
Severity: enhancement

Details

Reference
bz60835

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Anomie added a comment.Feb 4 2014, 6:29 PM

I may be misremembering, but I believe that "Access-Control-Allow-Origin: *" would allow any random external site to fetch the CSRF tokens and such. The JSONP method explicitly disables any token fetching, and also treats the request as being from an anonymous user regardless of any login cookies.

If your external site wants to interact with the API in a way JSONP doesn't allow, you should probably look into OAuth.

(In reply to comment #3)

I may be misremembering, but I believe that "Access-Control-Allow-Origin: *"
would allow any random external site to fetch the CSRF tokens and such. The
JSONP method explicitly disables any token fetching, and also treats the
request as being from an anonymous user regardless of any login cookies.
If your external site wants to interact with the API in a way JSONP doesn't
allow, you should probably look into OAuth.

Correct. CORS from untrusted domains is not secure, since that would make anti-csrf tokens useless.

If you have a specific domain you want to add to the whitelist, we could discuss the merits of it individually, but * is definitely not possible.

The thing is, cross-domain XMLHttpRequests that receive "Access-Control-Allow-Origin: *" responses are not allowed[1] to contain authentication information (cookies or HTTP authentication), so they're always anonymous, so there are no anti-CSRF tokens to be stolen!

If desired, it is possible to make credentials available in the request, by setting xhr.withCredentials to true in the request, setting "Access-Control-Allow-Credentials: true" in the response, and setting "Access-Control-Allow-Origin" to something other than "*" in the response. By default, though, the requests are anonymous.

If xhr.withCredentials is set to true and the server returns "Access-Control-Allow-Origin: *", the browser refuses to allow the response to be read, so - as far as I can tell - there is no danger of tokens being stolen.

[1] https://developer.mozilla.org/en/docs/HTTP/Access_control_CORS#Requests_with_credentials

I've set up a demonstration which shows that cross-domain requests are forbidden from using withCredentials=true when Access-Control-Allow-Origin is set to "*".

This page sets a cookie for the subdomain www.macropus.org, then uses it to fetch a token from the same subdomain, which succeeds (the cookie is sent even without withCredentials=true, as it's a request to the same subdomain):
http://www.macropus.org/2014/mediawiki-cors/

This page, on a different subdomain, is able to make an unauthenticated request to the other subdomain (as the response has an "Access-Control-Allow-Origin: *" header), but is forbidden from making a request with credentials:
http://git.macropus.org/mediawiki-cors-test/

The (very simple) code is here: https://github.com/hubgit/mediawiki-cors-test

If there's a flaw in the security logic, it would be very useful to know about; on the other hand, I hope this will be persuasive enough to re-open this ticket.

Note that for the white-listed domains, the response can still include a specific origin and so allow credentials to be sent; the "Access-Control-Allow-Origin: *" response would just be for non-whitelisted domains.

For information, on every public wikis hosted in the WMF cluster, $wgCrossSiteAJAXdomains contains the following domains:

'*.wikipedia.org',
'*.wikinews.org',
'*.wiktionary.org',
'*.wikibooks.org',
'*.wikiversity.org',
'*.wikisource.org',
'wikisource.org',
'*.wikiquote.org',
'*.wikidata.org',
'*.wikivoyage.org',
'www.mediawiki.org',
'm.mediawiki.org',
'wikimediafoundation.org',
'advisory.wikimedia.org',
'auditcom.wikimedia.org',
'boardgovcom.wikimedia.org',
'board.wikimedia.org',
'chair.wikimedia.org',
'chapcom.wikimedia.org',
'collab.wikimedia.org',
'commons.wikimedia.org',
'donate.wikimedia.org',
'exec.wikimedia.org',
'grants.wikimedia.org',
'incubator.wikimedia.org',
'internal.wikimedia.org',
'login.wikimedia.org',
'meta.wikimedia.org',
'movementroles.wikimedia.org',
'office.wikimedia.org',
'otrs-wiki.wikimedia.org',
'outreach.wikimedia.org',
'quality.wikimedia.org',
'searchcom.wikimedia.org',
'spcom.wikimedia.org',
'species.wikimedia.org',
'steward.wikimedia.org',
'strategy.wikimedia.org',
'checkuser.wikimedia.org',
'internal.wikimedia.org',
'login.wikimedia.org',
'meta.wikimedia.org',
'movementroles.wikimedia.org',
'office.wikimedia.org',
'otrs-wiki.wikimedia.org',
'outreach.wikimedia.org',
'quality.wikimedia.org',
'searchcom.wikimedia.org',
'spcom.wikimedia.org',
'species.wikimedia.org',
'steward.wikimedia.org',
'strategy.wikimedia.org',
'usability.wikimedia.org',
'wikimania????.wikimedia.org',
'wikimaniateam.wikimedia.org'

This allows requests from one wiki to another.

This is disabled on private wikis hosted in the WMF cluster.

Anomie added a comment.Feb 4 2015, 4:17 PM

Regarding the clarification of intent provided in T88532, I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: * (without any other CORS-related headers, particularly without Access-Control-Allow-Credentials) while also performing the same forcing of anonymous responses and refusal to provide tokens that are done for JSONP (see also Gerrit change 180430).

@csteipp, what do you think about that idea?

My attempt to more clearly phrase this task:

To allow client-side JavaScript applications to fetch information from MediaWiki APIs, add the following header to API responses, allowing the response to be read by an application running on a different domain:

Access-Control-Allow-Origin: *

In the current documentation for CORS usage in cross-site requests, it states:

"If the CORS origin check passes, MediaWiki will include the Access-Control-Allow-Credentials: true header in the response, so authentication cookies may be sent."

What it should also say -- once this is implemented -- is that if the CORS origin check doesn't pass, MediaWiki will not include the Access-Control-Allow-Credentials: true header in the response, so authentication cookies may not be sent, but MediaWiki will still include the Access-Control-Allow-Origin: * header so that unauthenticated requests can be accessed from any origin.

Notes:

eaton.alf reopened this task as Open.Feb 4 2015, 4:21 PM
eaton.alf set Security to None.

> I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: *

This could be good, except it makes clients add an extra parameter -- it would be most straightforward to return Access-Control-Allow-Origin: * when no origin parameter is specified.

Anomie added a comment.Feb 4 2015, 4:38 PM

> I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: *
This could be good, except it makes clients add an extra parameter -- it would be most straightforward to return Access-Control-Allow-Origin: * when no origin parameter is specified.

We need to have some trigger for "force anonymous response and no-tokens", we can't just blindly allow all origins because of the no-token part.

We might be able to use the presence of an Origin header without the origin parameter as such a trigger, but RFC 6454 § 7.3 explicitly allows the agent to give an Origin header even for same-origin requests so I'd be wary of doing that.

We need to have some trigger for "force anonymous response and no-tokens", we can't just blindly allow all origins because of the no-token part.

Could you elaborate on why that would be a problem?

Aklapper triaged this task as Low priority.Feb 4 2015, 11:36 PM
Annevk added a subscriber: Annevk.Mar 10 2015, 10:53 AM

Anomie, you're wrong. Unless the domains in question are behind a firewall (e.g. intranet or home network) adding Access-Control-Allow-Origin: * has absolutely no negative consequences. It enables the same kind of thing possible with curl. No need to worry about cookies or HTTP authentication.

Anomie, you're wrong. Unless the domains in question are behind a firewall (e.g. intranet or home network) adding Access-Control-Allow-Origin: * has absolutely no negative consequences. It enables the same kind of thing possible with curl. No need to worry about cookies or HTTP authentication.

Sorry, but that is not correct. Allowing browser access from arbitrary external sites would basically nullify our protection against CSRF attacks.

eaton.alf added a comment.EditedMar 10 2015, 2:34 PM

As I understand it, the CSRF protection involves sending a token to authenticated users, who must be sending requests from origins that are in the whitelist (i.e. Wikimedia sites that send credentials and are allowed to make edits) as those are the only origins for which the Access-Control-Allow-Credentials: true header is added to responses.

What I don't understand is how that relates to requests from any other origin, which are guaranteed to be anonymous. What's the harm in adding Access-Control-Allow-Origin: * to those responses?

Anomie, no it would not. I recommend studying e.g. https://annevankesteren.nl/2015/02/same-origin-policy or https://annevankesteren.nl/2012/12/cors-101 or maybe even reading the specification at https://fetch.spec.whatwg.org/ itself.

CSRF is about a request vulnerability. Adding a CORS header on a response is neither going to make you vulnerable, nor protected from such a vulnerability. CORS is about sharing the data in a response (if we ignore CORS preflights, which are not relevant here).

Specifying * only allows the data to be read and only from requests that include neither cookies nor HTTP authentication information associated with the user. This is essentially the same as if you did curl from a public server.

I've run into this today. I inferred from the documentation on MediaWiki I would get Access-Control-Allow-Origin: <value of origin header> since allowing all domains seems like a reasonably sensible option when doing GET queries, but, as discussed here I didn't. If we are really at the stage of not trusting the browsers to implement the standard correctly (as far as I know they all do), it would be possible to reject requests with the Cookie header sent.

While we're on the subject, what's the point in the origin GET parameter anyway? Why not just use the value of the Origin header? They're checked to make sure they're identical anyway so why do they both have to be present?

Another point to make here is that JSONP is less secure since then anyone with control over the Mediawiki site can make my users on my site execute arbitrary Javascript.

TheDJ added a subscriber: TheDJ.

I had informally asked chris to review the security aspects of this, after anne and frankie left their comments, but I don't think it was ever properly on the radar. So i'm adding Security-Team-Reviews to this, in hopes that at least the request is tracked.

So hereby the review request:
With the current CORS implementation that we have in core (significantly different from 2-3 years ago when this was last investigated), plus the comments of Anne, is there any reason why we should not remove the origin restriction on anonymous wildcard access ?

Second, with the changed implementation of the origin checks (basically denying multiple origins in the origin header), do we still actually need that origin param on the api request ?

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJul 15 2015, 6:55 PM
dpatrick added a project: Security-Team.
dpatrick moved this task from Backlog to In Progress on the Security-Team board.

Allowing read-only access for XMLHttpRequests to the API by using "Access-Control-Allow-Origin: *" (and "Access-Control-Allow-Credentials: false" as defense in depth) makes sense as long as our infrastructure can handle the additional load from increased API use and pre-flighting.

When considering exposure of user credentials as the primary threat, users using older browsers that don't implement CORS are protected by the same-origin policy. Users using browsers which fully and correctly implement CORS are protected by nature of their adherence to the spec. Our main concern is users using browsers with broken CORS implementations which, specifically, violate the CORS spec by passing cookies when "Access-Control-Allow-Origin: *" is set by the server, and for which the CORS behavior supersedes same origin behavior. I did not encounter any such implementations in my testing, which included the two most recent versions of each major browser listed at http://caniuse.com/#search=CORS. Further automated testing using BrowserStack may be done to achieve coverage closer to that reflected in our actual user statistics (https://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm), but this initial data is heartening.

Additionally, browsers which support CORS should automatically send the "Origin:" header, so I believe that the separate "origin" request parameter can be removed from, or at least deprecated in, the API (re. https://phabricator.wikimedia.org/T62835#1122434 and https://phabricator.wikimedia.org/T62835#1454829).

The reason for the "origin" request parameter was concern that the impact of varying all API responses on the Origin header would be disastrous for caching, see T22814#248552. @tstarling or someone else familiar with the caching should be asked if that concern still applies to our current varnish caching setup.

Also, are there any interesting browsers that violate the spec by using a cache for "Access-Control-Allow-Origin: *" if the same URL is hit from different origins?

For more defense in depth, BTW, we'd probably want to add the CORS check to ApiBase::lacksSameOriginSecurity() to additionally force an anonymous user and prevent certain other actions (login, account creation, token fetch) if we're returning "Access-Control-Allow-Credentials: false".

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

csteipp removed dpatrick as the assignee of this task.Jan 11 2016, 7:49 PM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptJan 11 2016, 7:49 PM
He7d3r added a subscriber: He7d3r.Jan 19 2016, 4:19 PM
brion added a subscriber: brion.Apr 1 2016, 1:27 PM

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

Krenair renamed this task from Enable cross-domain Wikipedia API requests in API's JSON responses to Enable cross-domain API requests in API's JSON responses.Apr 1 2016, 2:12 PM
Anomie added a comment.Apr 1 2016, 2:17 PM

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

I'm still waiting on a reply to T62835#1794676, mainly the first paragraph.

brion added a comment.Apr 1 2016, 2:20 PM

I'm still waiting on a reply to T62835#1794676, mainly the first paragraph.

Isn't caching of API requests pretty much completely broken to begin with? There's no way to purge API requests when their data is updated, so request bodies are going to be either uncached or cached too long.

Anomie added a comment.Apr 1 2016, 2:28 PM

It does rely on user-requested time-based caching rather than purging like index.php endpoints do, but it's not "completely broken". OTOH, I believe it would be very possible to write an "action=pagedata" endpoint that would be purgeable where action=query isn't. It might even be possible for action=query in the future, see T122867 for details.

But I believe the concern in T22814#248552 was that varying all API requests on Origin would fragment the cache so severely that it would negatively impact the whole caching system.

brion added a comment.Apr 1 2016, 2:36 PM

Looks like API requests by default are not cached; the caller must opt in to caching by setting maxage and/or smaxage on the URL. So, 'Origin' would only have to be added to the 'Vary' header -- and would only affect caching -- for GET/HEAD requests where maxage/smaxage are included in the URL. I have no idea how much of our API traffic is included in that subset, or how much of that subset is web browser traffic (which would include an Origin header) versus bot traffic (which would probably not).

Tgr added a subscriber: Tgr.Apr 8 2016, 6:46 AM

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

It would vary on the session cookie though, since some read modules return private data if the user is logged in and has sufficient permissions. The same is already true for JSONP so as far as I can see that would have no adverse effect on caching performance.

Tgr added a comment.Apr 8 2016, 6:56 AM
In T62835#2189281, @Tgr wrote:

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

...except that's only true for non-authenticated requests since for authenticated the standard requires Access-Control-Allow-Origin: <actual origin>. But the cache is already varied on the session cookie, so even if it is not varied on origin, that would work out, right? (Also, varnish could be hacked to replace * with the actual origin.)

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

I also ran into this recently and would love to see it closed!

Annevk removed a subscriber: Annevk.Apr 8 2016, 7:28 AM
Anomie added a comment.Apr 8 2016, 3:17 PM
In T62835#2189299, @Tgr wrote:
In T62835#2189281, @Tgr wrote:

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

...except that's only true for non-authenticated requests since for authenticated the standard requires Access-Control-Allow-Origin: <actual origin>. But the cache is already varied on the session cookie, so even if it is not varied on origin, that would work out, right?

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

(Also, varnish could be hacked to replace * with the actual origin.)

That would be a horrible idea, IMO.

Tgr added a comment.Apr 8 2016, 4:57 PM

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

Why? If it's an unauthenticated request, just set *. If it's an authenticated request, it won't be cached anyway (downstream it might be, but still varied on the session cookie, which is never the same for two different domains).

Anomie added a comment.Apr 8 2016, 5:13 PM

Reviewing this whole bug, we seem to have two different requests that seem to be being conflated. Some of that confusion may be my fault in the recent revival of attention to this task.

  1. Allow for CORS requests from any domain, returning Access-Control-Allow-Origin: * and Access-Control-Allow-Credentials: false and internally making ApiBase::lacksSameOriginSecurity() return true.
  2. Remove the need for clients to specify the origin URL parameter when intending to do a CORS request.

Number 1 I think could be done now, I don't see objections raised to it. And that seems to be what @brion needs. Number 2 is what is blocked on making sure it won't blow up our caching infrastructure to vary every request on the Origin header.

So perhaps we should move forward on #1 here and someone can file a separate task for #2.

Change 282391 had a related patch set uploaded (by Anomie):
API: Allow anonymous CORS from anywhere, when specifically requested

https://gerrit.wikimedia.org/r/282391

Anomie added a comment.Apr 8 2016, 5:16 PM
In T62835#2191089, @Tgr wrote:

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

Why? If it's an unauthenticated request, just set *. If it's an authenticated request, it won't be cached anyway (downstream it might be, but still varied on the session cookie, which is never the same for two different domains).

The session cookie for the two CORS requests to Commons will be the same, despite the different Origin headers. And it's not guaranteed that it won't be marked as cacheable in our varnish, depending on just what the request is.

Any progress here? I would love to see this issue get fixed. Thanks!

Anomie added a comment.Jun 8 2016, 7:01 PM

See T62835#2191138 for some clarification of the task. A patch has been submitted based on that clarification, but no one has been brave enough to merge it.

This question hasn't been answered:

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

This question hasn't been answered:

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

Yes, such a thing is possible, although it's a bit ugly. https://gerrit.wikimedia.org/r/294348

Change 282391 merged by jenkins-bot:
API: Allow anonymous CORS from anywhere, when specifically requested

https://gerrit.wikimedia.org/r/282391

Anomie closed this task as Resolved.Jul 11 2016, 3:21 PM
Anomie claimed this task.

Marking this bug as resolved, since unauthenticated cross-domain API requests are now possible. This should be deployed to WMF wikis with 1.128.0-wmf.10, see https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap for the schedule.

Again, if someone wants to follow up on the tangent about the need for the 'origin' URL parameter, file a separate task for that.

TheDJ added a comment.Jul 12 2016, 9:17 AM

Note, we should document this on the mw.org CORS page

Hello,

		if ( $request->getVal( 'origin' ) === '*' ) {
			$this->lacksSameOriginSecurity = true;
			return true;
		}

Doesn't seem to solve the problem for client applications running in web browser where application have no control over 'Origin' header which browser includes in request. Why not just check if request has credentials and if it does NOT then include Access-Control-Allow-Origin: * ?

Tgr added a comment.Nov 19 2016, 7:16 PM

That line checks the origin URL parameter, not the header.

elf-pavlik added a comment.EditedNov 19 2016, 9:11 PM

My bad! I understood from reading this thread that Access-Control-Allow-Origin: * gets included only for requests with header Origin: * and didn't verify it before posting that comment.

After your comment and re-reading https://www.mediawiki.org/wiki/Manual:CORS#Description I understood that it talks about the query string parameter. I made edit to that page to state it even more clearly that it doesn't have anything to do with the Origin header of HTTP request.
https://www.mediawiki.org/w/index.php?title=Manual:CORS&diff=prev&oldid=2289620

sbassett moved this task from Waiting to Done on the Security-Team board.Jun 11 2019, 6:03 PM