Page MenuHomePhabricator

AQS 2.0 Response Headers - resolve differences with production
Closed, ResolvedPublic

Description

Background/Goal

When executed locally, the new AQS 2.0 services are currently not sending the same response headers as the existing production service are when executed from our production infrastructure. Notably, there are differences in security- and caching-related headers, as well as some others.

Determine which of these headers will be the responsibility of the service and which will be the responsibility of another layer (Load Balancer, Varnish, etc. I don't think the prod endpoints are going through the API Gateway at this time). Revise as needed to make the new service header output match production.

If we can make this easier (both for ourselves now and also for any future services) by pushing some of the service changes to the scaffold/servicelib level, we should do that as part of this task. Keep in mind that even if all services need to send the same set of headers, different services may need to send different values for them. So consider making scaffold/servicelib changes configurable.

Unit tests should be created for any service changes. Once the scope of service-level changes is complete, we should consider whether a separate QA ticket is needed for related integration testing.

I compared the following requests:

Here are the related headers that I see the existing production service sending, as compared to the new AQS 2.0 services (I checked via Device Analytics, presumably the other services are the same):

headerproduction value
accept-chSec-CH-UA-Arch,Sec-CH-UA-Bitness,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-UA-Platform-Version
accept-rangesbytes
access-control-allow-headersaccept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-allow-methodsGET,HEAD
access-control-allow-origin*
access-control-expose-headersetag
cache-controls-maxage=14400, max-age=14400
content-security-policydefault-src 'none'; frame-ancestors 'none'
nel (Network Error Logging){ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policyinterest-cohort=(),ch-ua-arch=(self "intake-analytics.wikimedia.org"),ch-ua-bitness=(self "intake-analytics.wikimedia.org"),ch-ua-full-version-list=(self "intake-analytics.wikimedia.org"),ch-ua-model=(self "intake-analytics.wikimedia.org"),ch-ua-platform-version=(self "intake-analytics.wikimedia.org")
referrer-policyorigin-when-cross-origin
report-to{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
strict-transport-securitymax-age=106384710; includeSubDomains; preload
varyAccept-Encoding
x-content-security-policydefault-src 'none'; frame-ancestors 'none'
x-content-type-optionsnosniff
x-frame-optionsSAMEORIGIN
x-webkit-cspdefault-src 'none'; frame-ancestors 'none'
x-xss-protection1; mode=block

The only headers in this list currently provided by the AQS 2.0 services are x-frame-options (which uses the value "deny" instead of the production value) and x-xss-protection (which uses a similar but not identical value as production).

The only headers in this list that I found in the production AQS codebase was cache-control. It is possible I missed something. But if not, that suggests another layer is providing them. If I'm right about that, then there may be very little to do for this task. But we should be sure.

Acceptance Criteria

  • 1. the common function is implemented
  • 2. tasks for individual services are created

Tasks to be performed as part of acceptance criteria

This is just a list of the same headers as the above table, but in checkbox form, so that they can be marked off as we verify for each that either another layer provides them (and we therefore have no changes to make) or any necessary changes are made.

And don't forget:

  • integration testing implications are considered and any necessary QA task is created

Artifacts & Resources

As a reminder, what we were previously calling "Unique Devices" is now called "Device Analytics" and its code is in gerrit, not gitlab:

The other services remain in gitlab. However, the service scaffolding and servicelib repositories are in gerrit.

Event Timeline

SGupta-WMF changed the task status from Open to In Progress.Feb 2 2023, 7:38 AM

@BPirkle From what I can say from my initial analysis , that we can have some reponse headers return from the code and some of them from the API gateway on the basis on domain matching / url pattern matching for AQS 2.0 . But what I understand is that API gateway might take up a significant amount of time to be live , we need to add everything to the code as of now.
Also , since the discussion for routers in ongoing and this is AQS specific , I am thinking of adding these in aqassist . Let me know how you feel about that.

what I understand is that API gateway might take up a significant amount of time to be live , we need to add everything to the code as of now.

I'd qualify that a bit. The API Gateway itself is already live and receiving production traffic (for example this url goes through the gateway). However, its support of a lot of things is still formative, and this may be one of them.

I'm not sure exactly how the production AQS 2.0 endpoints are going to be exposed, and if there is an opportunity for other layers in our infrastructure to provide these headers. I'm curious if we might get some of them for free from existing infrastructure, or if they are all our responsibility.

@hnowlan , do you have any direction here? Or if you're not the right person who do you suggest we ask? (Traffic? Infrastructure? Performance?) We'll eventually need a header-by-header understanding, but I'm not (yet) asking for that if you don't have it readily at hand. I'm more just asking for advice on how we should best find this out.

Also , since the discussion for routers in ongoing and this is AQS specific , I am thinking of adding these in aqassist

aqsassist sounds like a great idea. Let's hear from Hugh before we actually code this, though, to get a better understanding of what part is the service's responsibility.

@hnowlan Hey Hugh , waiting for your response on this . I have started coding this , and this blocking further progress . Thanks .

what I understand is that API gateway might take up a significant amount of time to be live , we need to add everything to the code as of now.

I'd qualify that a bit. The API Gateway itself is already live and receiving production traffic (for example this url goes through the gateway). However, its support of a lot of things is still formative, and this may be one of them.

I'm not sure exactly how the production AQS 2.0 endpoints are going to be exposed, and if there is an opportunity for other layers in our infrastructure to provide these headers. I'm curious if we might get some of them for free from existing infrastructure, or if they are all our responsibility.

@hnowlan , do you have any direction here? Or if you're not the right person who do you suggest we ask? (Traffic? Infrastructure? Performance?) We'll eventually need a header-by-header understanding, but I'm not (yet) asking for that if you don't have it readily at hand. I'm more just asking for advice on how we should best find this out.

If we plan on exposing these endpoints to the public internet, the API Gateway is a good place to do it - that is only on the condition that we are happy exposing it on api.wikimedia.org. If that doesn't work or we wish to preserve existing URL paths then the gateway probably isn't the place for it and we will need to route traffic via the ATS edge. And yes absolutely, the API gateway is live and operational for years now!
We should make a decision on this as soon as we can and capture the decision itself somehow.

A number of the headers shown above come from the edge. The following specifically will not need to be set manually by AQS:

  • nel
  • report-to

Many headers can be set by our varnish config, but more headers in this case appear to be being set by the RESTBase codebase rather than the AQS one. This library alone is responsible for many of the above listed headers, but there are many other headers being set by the other libraries and in many other places in the restbase codebase. It might just be quicker to check out a RESTBase URI and work backwards from there via something like curl -v -o /dev/null "https://ga.wikipedia.org/api/rest_v1/page/mobile-html/Sliabh".

If we plan on exposing these endpoints to the public internet, the API Gateway is a good place to do it - that is only on the condition that we are happy exposing it on api.wikimedia.org. If that doesn't work or we wish to preserve existing URL paths then the gateway probably isn't the place for it and we will need to route traffic via the ATS edge.

We are obligated to maintain the existing urls for AQS 2.0.

Hopefully that's not forever - I'd love to be able to deprecated and retire the RESTBase-style urls eventually. But that time frame will be measured in at least months and more likely years.

@BPirkle I went through the restbase and AQS 1.0 code , and seems like most of the headers are set from the restbase code and just a handful in AQS 1.0 code . Should we go ahead and replicate these in AQS 2.0 code and rely on ATS edfe for others?

@BPirkle I went through the restbase and AQS 1.0 code , and seems like most of the headers are set from the restbase code and just a handful in AQS 1.0 code . Should we go ahead and replicate these in AQS 2.0 code and rely on ATS edfe for others?

Yes.

I know we've already talked about it in earlier comments, but let's do as much as possible in aqsassist, so that it easily applies across all the AQS 2.0 services.

We may later find out that we want some of it in the Go servicelib/scaffolding, but we're not there yet, and it should be straightforward to move things from aqsassist upwards later on.

@BPirkle Apologies my previous comment was not framed properly , I just meant that we can replicate the header coming from the code in codebase , that we decided is aqassist and the others could come from ATS edge

@BPirkle Added the following headers to aqassist code :-
access-control-allow-headers
access-control-allow-methods
access-control-allow-origin
access-control-expose-headers
content-security-policy
referrer-policy
x-content-type-options
x-frame-options
x-xss-protection
cache-control
content-type

Remaining ones to be set from ATS Edge :-

accept-ch
accept-ranges
nel
permissions-policy
report-to
strict-transport-security
vary

Please review the MR -https://gitlab.wikimedia.org/frankie/aqsassist/-/merge_requests/9

I will start in-service implementation and unit test cases in service for these, Thanks!

@BPirkle Please check @hnowlan comment on MR , I am amending the code and unit tests as per that

I believe I have the most up to date file changes for device analytics. I check what the headers should return for AQS 2.0 using the code below and it returns :

Headers returned:

{'Server': 'fasthttp', 'Date': 'Mon, 20 Mar 2023 04:11:05 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '1022', 'X-Xss-Protection': '1; mode-block', 'X-Frame-Options': 'deny'}

code used to return headers:

import requests


base_url = "http://localhost:8089/metrics/unique-devices/en.wikipedia.org/all-sites/daily/20200220/20200225"

header = {"accept": "application/json",
          "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

response = requests.get(base_url, headers=header)



print(response.headers)

@SGupta-WMF Based on the tickets it should be returning a lot more than this or is there something I am missing ?

@Emeka-okechukwu Hi Emeka, it seems like we have not updated the services to add this new function for returning headers. Will create corresponding tasks for each service and you can test them accordingly. Thanks.

@SGupta-WMF thanks for the response. I will move this ticket back to in progress

JArguello-WMF updated the task description. (Show Details)
JArguello-WMF moved this task from Blocked/Paused to Done on the API Platform (Sprint 06) board.

The aqsassist change has been merged. Is this ready for QA now?

@BPirkle As decided earlier, the QA would be on implementation tasks of respective services which are WIP. We can mark this done.

@JArguello-WMF We can resolve this / mark this as done.