Page MenuHomePhabricator

Page analytics Implement header function
Closed, ResolvedPublic

Description

Ke task of Epic Page Analytics service - T288296
Background/Goal

Read T325827 for context

When executed locally, the new AQS 2.0 services are currently not sending the same response headers as the existing production service are when executed from our production infrastructure. Notably, there are differences in security- and caching-related headers, as well as some others. So as a part the above mentioned task a function has been created in aqassist to return some of these headers from code>

Acceptance Criteria
The service should be updated to implement the aqassist function, unit tests for the same should be added

Event Timeline

Change 918553 had a related patch set uploaded (by Fgoodwin; author: Fgoodwin):

[generated-data-platform/aqs/page-analytics@main] Request headers: add relevant data to security header in accordance with previous API

https://gerrit.wikimedia.org/r/918553

Change 918553 merged by Santiago Faci:

[generated-data-platform/aqs/page-analytics@main] Request headers: add relevant data to security header in accordance with previous API

https://gerrit.wikimedia.org/r/918553

Once security headers are set, we have found some errors that we should fix before moving on this ticket. Some of them were already present and some others are new because some data has changed in the last version of the aqs-docker-test-env:

  1. There are some errors related to some data that is currently missing in the new version of aqs-docker-test-env. Some specific dates are not valid and we should use new ones for some unit tests
    • To fix that, we should look for date values that provides data and use them in the URL we are using for the use case
  2. There are some errors related to a structure (httperr) where detail field is declared as an array. But the actual response body for these errors is returning an string to provide the detail
    • In some of the test files there are two structures (httperr and httperrA). All tests are using httperr where detail is declared as an array but in these cases we should use httperrA because it's the one that declares detail as a string. We can also remove the unused httperr structure because it's something we won't need anymore.
  3. There are one error related to some specific data that never existed in the aqs-docker-test-env. We should add it.
    • For this case we need to add some data with monthly granularity to the local_group_default_T_pageviews_per_article_flat dataset. It would be enough if we add a set of similar data with monthly granularity because it's what this test case try to test.

@FGoodwin @Sfaci I have made some changes for

here are some errors related to a structure (httperr) where detail field is declared as an array. But the actual response body for these errors is returning an string to provide the detail
In some of the test files there are two structures (httperr and httperrA). All tests are using httperr where detail is declared as an array but in these cases we should use httperrA because it's the one that declares detail as a string. We can also remove the unused httperr structure because it's something we won't need anymore.

in my MR
https://gerrit.wikimedia.org/r/c/generated-data-platform/aqs/page-analytics/+/919472

I just merged the MR that @SGupta-WMF mentioned, so if you rebase you'll get those changes.

Change 923569 had a related patch set uploaded (by Fgoodwin; author: Fgoodwin):

[generated-data-platform/aqs/page-analytics@main] Test: fixes failing test issue regarding data type

https://gerrit.wikimedia.org/r/923569

HI @FGoodwin Please confirm if this is ready for code review.

Change 923569 abandoned by Sg912:

[generated-data-platform/aqs/page-analytics@main] Test: fixes failing test issue regarding data type

Reason:

https://gerrit.wikimedia.org/r/923569

I do believe my repo is updated for page analytics

@Sfaci according to T325827 the below are the header keys to be added:

  • accept-ch
  • accept-ranges
  • access-control-allow-headers
  • access-control-allow-methods
  • access-control-allow-origin
  • access-control-expose-headers
  • cache-control
  • content-security-policy
  • nel
  • permissions-policy
  • referrer-policy
  • report-to
  • strict-transport-security
  • vary
  • x-content-type-options
  • x-frame-options
  • x-xss-protection

When I check page analytics locally I see the below in the header

{
   "Access-Control-Allow-Headers":"accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding",
   "Access-Control-Allow-Methods":"GET,HEAD",
   "Access-Control-Allow-Origin":"*",
   "Access-Control-Expose-Headers":"etag",
   "Cache-Control":"s-maxage=14400, max-age=14400",
   "Content-Security-Policy":"default-src 'none'; frame-ancestors 'none'",
   "Content-Type":"application/json; charset=utf-8",
   "Referrer-Policy":"origin-when-cross-origin",
   "X-Content-Type-Options":"nosniff",
   "X-Frame-Options":"deny",
   "X-Xss-Protection":"1; mode=block",
   "Date":"Mon, 07 Aug 2023 03:46:47 GMT",
   "Transfer-Encoding":"chunked"
}

I want to confirm if the missing header keys from the list not necessary ?

As far as I know, that full headers list doesn't mean "all the headers we have to add locally". I think it means something like "all the header that every service has to have". Some of them has to be added locally and others by the rest of the software layers.

In the task you mentioned T325827: AQS 2.0 Response Headers - resolve differences with production, a lot of headers are mentioned in the description but, during the conversation, for example Hugh commented that some of these headers are provided for the ATS Edge (Apache Traffic Server?). Later Surbhi list all the headers we have to add locally. What we did was to create a aqsassist function to provide all these local headers.

At this moment, that function is adding the following ones:

var HeadersToAdd = map[string]string{
	"access-control-allow-headers":  "accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding",
	"access-control-allow-methods":  "GET,HEAD",
	"access-control-allow-origin":   "*",
	"access-control-expose-headers": "etag",
	"cache-control":                 "s-maxage=14400, max-age=14400",
	"referrer-policy":               "origin-when-cross-origin",
	"x-content-type-options":        "nosniff",
	"x-xss-protection":              "1; mode=block",
	"x-frame-options":               "deny",
	"content-type":                  "application/json; charset=utf-8",
	"content-security-policy":       "default-src 'none'; frame-ancestors 'none'",
}

And we are adding 'Etag' header as well but maybe that improvement is not ready in the code that is available in the page-analytics' repo (we are currently working on this service). That was added later and didn't exist when this task was created, and maybe page-analytics is using a previous version of the aqsassist function.

That's why the acceptance criteria for this task is:

Acceptance Criteria
The service should be updated to implement the aqassist function, unit tests for the same should be added

And that is done, so I think the task can be considered as done as well. Anyway, if you prefer, we can keep this task until the entire service is ready. That way, when we push all the code, you could test it properly even with the new 'Etag' header.

@SGupta-WMF could you please review my previous comment and correct me if I'm wrong? You probably have more context about this topic

Hi @Sfaci @EChukwukere-WMF So as a part of header implementation , some of the headers come from code and others from infrastructure ( which can be anything ATS edge / API gateway ) . Since we are testing locally these headers should be present (coming from code base)
access-control-allow-headers
access-control-allow-methods
access-control-allow-origin
access-control-expose-headers
content-security-policy
referrer-policy
x-content-type-options
x-frame-options
x-xss-protection
cache-control
content-type
etag

The other headers are not to be tested on local

@SGupta-WMF @Sfaci thanks for the clarification.

Hence from my local it seems the etag key is missing in the headers,, unless there is an update I am not aware of

{
   "Access-Control-Allow-Headers":"accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding",
   "Access-Control-Allow-Methods":"GET,HEAD",
   "Access-Control-Allow-Origin":"*",
   "Access-Control-Expose-Headers":"etag",
   "Cache-Control":"s-maxage=14400, max-age=14400",
   "Content-Security-Policy":"default-src 'none'; frame-ancestors 'none'",
   "Content-Type":"application/json; charset=utf-8",
   "Referrer-Policy":"origin-when-cross-origin",
   "X-Content-Type-Options":"nosniff",
   "X-Frame-Options":"deny",
   "X-Xss-Protection":"1; mode=block",
   "Date":"Mon, 07 Aug 2023 03:46:47 GMT",
   "Transfer-Encoding":"chunked"
}

"Etag" header is not added yet in the code that is available in the repo. It's already added in the code I'm working on but I had to pause some related task to prioritize the test environments.

In a few days I hope to push an interesting MR with this and other improvements for page-analytics.

Ok I will put this back in progress then.

Change 947334 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[generated-data-platform/aqs/page-analytics@main] unit tests and some other improvements: This MR includes unit tests and some other improvements (more details here: https://phabricator.wikimedia.org/T342028):

https://gerrit.wikimedia.org/r/947334

Change 947334 merged by Santiago Faci:

[generated-data-platform/aqs/page-analytics@main] unit tests and some other improvements: This MR includes unit tests and some other improvements (more details here: https://phabricator.wikimedia.org/T342028):

https://gerrit.wikimedia.org/r/947334

Test status : QA PASS

code level headers from aqassist code :-

access-control-allow-headers
access-control-allow-methods
access-control-allow-origin
access-control-expose-headers
content-security-policy
referrer-policy
x-content-type-options
x-frame-options
x-xss-protection
cache-control
content-type

header from page analytics service:

{
   "Access-Control-Allow-Headers":"accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding",
   "Access-Control-Allow-Methods":"GET,HEAD",
   "Access-Control-Allow-Origin":"*",
   "Access-Control-Expose-Headers":"etag",
   "Cache-Control":"s-maxage=14400, max-age=14400",
   "Content-Security-Policy":"default-src 'none'; frame-ancestors 'none'",
   "Content-Type":"application/json; charset=utf-8",
   "Etag":"W/\"20-da39a3ee5e6b4b0d3255bfef95601890afd80709\"",
   "Referrer-Policy":"origin-when-cross-origin",
   "X-Content-Type-Options":"nosniff",
   "X-Frame-Options":"deny",
   "X-Xss-Protection":"1; mode=block",
   "Date":"Thu, 14 Sep 2023 19:22:45 GMT",
   "Transfer-Encoding":"chunked"
}

This matches what is expected from the headers keys for the Page analytics service