Page MenuHomePhabricator

Provide browser-based apps an alternative way to specify the user agent in RESTBase APIs
Closed, ResolvedPublic

Description

Per the Wikimedia user agent policy, users of Wikimedia APIs must set an informative user agent. Normally this is done via the User-Agent HTTP header, but browser-based applications are not allowed to change that for security reasons. The policy says such applications should provide a custom Api-User-Agent header. The action API accepts such headers but the REST APIs do not.

Wikimedia REST APIs should, at a minimum, whitelist such a custom header in their CORS rules, and ideally log its contents instead of, or next to, the default user agent header.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added a project: RESTBase.
Tgr subscribed.
GWicke triaged this task as Medium priority.Oct 12 2016, 4:55 PM
GWicke added a project: Services (next).
GWicke set Security to None.

Unfortunately, if we require clients to set Api-User-Agent header, that would break CORS for redirect pages ( see https://phabricator.wikimedia.org/T148368#2732596 ). Currently a cross-domain requests to RESTBase doesn't require a preflight, so all works, but if we require the Api-User-Agent header to be set, preflight becomes required.

So, supporting this is possible for cases of same-domain requests (like from VisualEditor), but the usability would be quite limited. Do you think we still wanna implement it just for VE?

The latest CORS spec does support redirects (see https://github.com/whatwg/fetch/issues/204 for history). You should keep the Api-User-Agent header optional for backwards compatibility (and to not inconvenience clients which do supply a useful User-Agent), but you could make it possible - adding it to the CORS whitelist does not harm clients which don't try to set it.

Also, if the API does the redirect (as opposed to e.g. a varnish redirect because the request went to the wrong domain), you can just avoid redirecting CORS preflight requests (you can still redirect the real request afterwards).

@Tgr, as of today we do allow (but don't require) Api-User-Agent.

Thanks for the pointer to the change in the fetch spec. I'm glad that they fixed this. It'll take a while until a significant portion of deployed clients have picked up that change, but at least it's moving in the right direction.

Also, if the API does the redirect (as opposed to e.g. a varnish redirect because the request went to the wrong domain), you can just avoid redirecting CORS preflight requests (you can still redirect the real request afterwards).

Those responses would normally not be cacheable, which we'd like to avoid.

Also, create a PR to log the Api-User-Agent if it's provided.

PR merged, moving to 'done' pending deployment

Functionally this is now complete. To make this feature discoverable, we should mention the header in the api docs.

@Tgr, as of today we do allow (but don't require) Api-User-Agent.

Sure, what I mean is that it is effectively not allowed for clients doing CORS requests (since our CORS responses do not allow that header).

Those responses would normally not be cacheable, which we'd like to avoid.

As far as I can see there would be no difference. The browsers will cache the preflight response if the Access-Control-Max-Age header is set, and presumably all requests would need the same CORS rules so on our side it's trivially cacheable (even the API allows authenticated access, the credentials are not sent with the preflight request, so no cache splitting needed). For the rest of the requests after the preflight, the cache semantics would not be any different from the non-CORS scenario.

@Tgr, as of today we do allow (but don't require) Api-User-Agent.

Sure, what I mean is that it is effectively not allowed for clients doing CORS requests (since our CORS responses do not allow that header).

This is no longer true for the REST API:

curl -v https://en.wikipedia.org/api/rest_v1/page/html/Boston 2>&1 | grep access-control-allow-headers
< access-control-allow-headers: accept, content-type, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding

Those responses would normally not be cacheable, which we'd like to avoid.

As far as I can see there would be no difference. The browsers will cache the preflight response if the Access-Control-Max-Age header is set, and presumably all requests would need the same CORS rules so on our side it's trivially cacheable (even the API allows authenticated access, the credentials are not sent with the preflight request, so no cache splitting needed). For the rest of the requests after the preflight, the cache semantics would not be any different from the non-CORS scenario.

If we avoid using HTTP redirects for legacy CORS clients, and directly return the content of the redirect target instead, then that response can not be edge-cached without introducing major cache invalidation headaches. I'm not sure if this is indeed what you had in mind, though. Could you elaborate how you would combine caching of responses with support for legacy CORS clients with broken redirect behavior?

< access-control-allow-headers: accept, content-type, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding

Oh, I see. Thanks!

Could you elaborate how you would combine caching of responses with support for legacy CORS clients with broken redirect behavior?

As I understand (did not test though) this is how a preflight CORS request can be redirected without breaking old clients:

> OPTIONS /alias HTTP 1.1
> Access-Control-Request-Method: GET
> Access-Control-Request-Headers: Api-User-Agent
> Origin: http://example.com

< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: POST, GET, OPTIONS
< Access-Control-Allow-Headers: Api-User-Agent
< Access-Control-Max-Age: 300

> GET /alias HTTP 1.1
> Origin: http://example.com
> Api-User-Agent: foo

< HTTP/1.1 301 Found
< Location: /canonical

> OPTIONS /canonical HTTP 1.1
> Access-Control-Request-Method: GET
> Access-Control-Request-Headers: Api-User-Agent
> Origin: http://example.com

< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: POST, GET, OPTIONS
< Access-Control-Allow-Headers: Api-User-Agent
< Access-Control-Max-Age: 300

> GET /canonical HTTP 1.1
> Origin: http://example.com
> Api-User-Agent: foo

< HTTP/1.1 200 OK
< ...

It is somewhat wasteful in that there is an extra preflight round but it should work, and all responses should be cacheable: the OPTIONS responses are completely static (do not depend upon the URL), and the GET responses are no different from non-CORS.

Credentialed CORS requests are less straightforward because Access-Control-Allow-Origin: * is not allowed, but it's probably not hard to add edge logic to whatever handles the caching (varnish, I suppose) to replace * with the request origin without polluting the cache.

@Tgr Where did you get this network dump from?

Non of the browsers I've tested do the second OPTIONS request, they just refuse to execute a redirected request.

Uh, yeah, my bad. I misread the (old) spec. Looks like there is no way to make it work then, before browsers get updated for the newest Fetch spec.

GWicke claimed this task.

Resolving, per discussion and PRs. The documentation updates will go out with the next RESTBase deploy.