Page MenuHomePhabricator

Client Developer has a cookie-free API call
Open, MediumPublic

Description

"As a Client Developer, I want to avoid getting Set-Cookie headers, or having to provide Cookie headers, in my API requests, so that I can concentrate on OAuth 2.0 as the sole way to authorize my app."

RESTful API servers are usually cookie-free; client developers don't have to keep track of cookies in their clients, and can use other authentication mechanisms, like OAuth 2.0, for their authorization.

MediaWiki and other parts of our stack can add cookies even for API endpoints, so it would be beneficial to enforce this cookie-free discipline at the gateway level.

See sub-tasks for done criteria.

Event Timeline

I noticed recently when using the Core REST API that my client received these cookies:

curl -v https://en.wikipedia.org/w/rest.php/v1/page/Sandbox
# ... other output ...
set-cookie: WMF-Last-Access=23-Jul-2020;Path=/;HttpOnly;secure;Expires=Mon, 24 Aug 2020 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=23-Jul-2020;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Mon, 24 Aug 2020 12:00:00 GMT
set-cookie: GeoIP=CA:QC:Montreal:45.53:-73.59:v4; Path=/; secure; Domain=.wikipedia.org

I'm not actually sure where these come from. If they're upstream of the gateway, it would be cool to filter them out if we can. If they're downstream of the gateway, it would be interesting to know how important they are and if they could be left out of requests that are specifically for the API.

eprodromou reassigned this task from eprodromou to hnowlan.Jul 27 2020, 1:00 PM
eprodromou triaged this task as Medium priority.

Should Envoy be concerned with managing cookies at all? If cookies are being set then that seems like a problem with Mediawiki or other upstream services. I think Envoy should not bother managing cookies at all, it's up to the upstream services to ignore cookies if cookies are for some reason being set.

WDoranWMF moved this task from Ready to Doing on the Platform Team Workboards (Green) board.
WDoranWMF added a subscriber: hnowlan.
eprodromou updated the task description. (Show Details)
eprodromou added a subscriber: WDoranWMF.

I split this into two tasks, T259294 and T259296.

I'm going to figure out how important the cookie headers that are coming from the MediaWiki REST API are; if they're not important, we can do the second ticket, but that shouldn't block the first.

OK, I'm learning important things about cookies! The WMF-Last-Access and WMF-Last-Access-Global cookies are set by Varnish and/or ATS for unique device tracking, documentation here:

https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution

The GeoIP cookie is for Analytics; documented here.

https://wikitech.wikimedia.org/wiki/Geolocation

All of them are set by Varnish or the traffic layer, so cookie-filtering in Envoy won't affect them. I'm going to check with Traffic team if they make sense for API traffic (seems like no to me, but I could be wrong), and if not if it's possible to leave them off for API Gateway requests.

WDoranWMF added subscribers: Nuria, Ottomata.EditedJul 31 2020, 12:38 PM

OK, I'm learning important things about cookies! The WMF-Last-Access and WMF-Last-Access-Global cookies are set by Varnish and/or ATS for unique device tracking, documentation here:

https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution

The GeoIP cookie is for Analytics; documented here.

https://wikitech.wikimedia.org/wiki/Geolocation

All of them are set by Varnish or the traffic layer, so cookie-filtering in Envoy won't affect them. I'm going to check with Traffic team if they make sense for API traffic (seems like no to me, but I could be wrong), and if not if it's possible to leave them off for API Gateway requests.

Can we also check with Analytics? @Nuria and @Ottomata are these cookies useful for request to the MW REST API and/or would they be in the future?

The GeoIP cookie is for Analytics; documented here.

I don't think the GeoIP cookie is for analytics; that documentation just says that we geolocate IPs in a geocoded_data field on some Hive tables. I didn't know there was a GeoIP cookie! I think Traffic might use that for GeoDNS routing, but I'm not sure.

[is WMF-Last-Access] useful for request to the MW REST API and/or would they be in the future?

I'm going to guess not, these are used for counting unique devices, and I don't know if we include API requests in those counts. @Nuria and @mforns or maybe @Milimetric would know better.

eprodromou added a subscriber: Brandon.

I've pinged @Brandon and he thought it was OK to remove these cookies for API Gateway calls. I've started the ticket T260943 to track progress.

I've pinged @Brandon and he thought it was OK to remove these cookies for API Gateway calls. I've started the ticket T260943 to track progress.

well you pinged the wrong brandon though...

Brandon removed a subscriber: Brandon.Aug 21 2020, 12:45 AM
Restricted Application added a project: Operations. · View Herald TranscriptAug 21 2020, 4:38 AM
CDanis added a subscriber: BBlack.Aug 21 2020, 4:49 AM

Ha! Sorry about that, @Brandon!

Nuria added a comment.Aug 24 2020, 8:57 PM

@eprodromou In order to know how/if removal of cookies will affect metrics we would need to run some tests on our unique devices calculations. The GeoIP cookie is not used for analytics at all but it is used * i think* for routing of requests. Traffic can speak to that

I'm moving this to blocked until we work out how to get the cookies out of the API requests in varnish.

Joe added a subscriber: Joe.Sep 30 2020, 3:37 PM

Those cookies are harmless cookies we set at the edge cache for all requests.

We could add an exception for the api gateway, but it would add more complexity to our already complex varnish business logic, and AIUI this is an optimization rather than a necessity.

I'll let Traffic folks comment further, but I'd be inclined to avoid special-casing the api gateway here.