Page MenuHomePhabricator

haproxy: capture x-wmf-* headers in webrequest data set
Closed, ResolvedPublic

Description

The webrequest analytics data set should include ratelimit information exposed by the REST gateway using the x-wmf-* headers (see T417780). In particular, the following headers will be exposed:

  • x-wmf-ratelimit-class
  • x-wmf-user-id postponed until we have clarity on privacy concerns

From this slack thread :
On the HAProxy / HaproxyKafka side this could be summarized (roughly) as:

  • Read response headers (coming from API) and save it to a variable
  • Edit the log format to include the new variable
  • Remove the response header to avoid sending it to the client
  • Edit HaproxyKafka (edit/recompile/repackage/distribute across all nodes) to correctly read the new log format (read the new fields and include them in the document sent to Kafka)

There is a concern about sending a user identifier in webrequest (user_id).
As of now no user identifying cookie flows in this dataset
It would be good to have a confirmation from the security team that this is ok.

Event Timeline

Let's postpone capturing x-wmf-user-id. x-wmf-ratelimit-class is the urgent one.

Note that x-wmf-user-id still needs to be tripped!

In the merged task, I was proposing not creating a new header, but instead adding to the existing X-Requestctl header using the same convention we use for haproxy applied rules, so something like X-Requestctl: rgw:${x-wmf-ratelimit-class}.

In the merged task, I was proposing not creating a new header, but instead adding to the existing X-Requestctl header using the same convention we use for haproxy applied rules, so something like X-Requestctl: rgw:${x-wmf-ratelimit-class}.

I don't think this is a good idea.. X-Requestctl reports the requestctl rules that matched against a request, "polluting" it with rest gateway rate-limit classes could be confusing

Change #1243870 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] cache::haproxy: save x-wmf-ratelimit-class content for webrequest

https://gerrit.wikimedia.org/r/1243870

@daniel it could be helpful to have some (also fake) example of values for X-WMF-Ratelimit-Class headers, to do some optimization in the pipeline, do you have some ?

Thanks to the @JAllemandou summary, I've wrote down some simple notes to understand the amount of work and the missing pieces, please have a look and eventually comment/fix:

Informationsourceheader nameX-Analitics key nameNotesNeed to be removed at edge?
Ratelimit ClassBackendX-WMF-Ratelimit-Classrl_classCRYes
x-trusted-requestHAProxyX-Trusted-Requesttrusted_reqCRNo (request header)
x-provenance-X-Provenance-NO NEED TO ADD IT TO X-AnalyticsNo
Auth TypeBackend??Will be set in X-Analytics directly in the backend responseNo (not a separate header)
X-Is-BrowserHAProxyX-Is-Browserx_is_browserAlready added to X-Analytics by Varnish when foundNo (request header)

IIUC at the moment the area where I can proceed is the X-Trusted-Request header that needs to be "capture" in the requests by Varnish and appended to X-Analytics (key should be x_trusted_request=... to stick with the current nomenclature.

Change #1247034 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] varnish: add headers to x-analytics

https://gerrit.wikimedia.org/r/1247034

Change #1243870 abandoned by Fabfur:

[operations/puppet@production] cache::haproxy: save x-ratelimit-class content for webrequest

Reason:

superseded by 1247034

https://gerrit.wikimedia.org/r/1243870

Can be set in X-Analytics directly in the backend response ?

Yes, the plan is to do that for now: T418606: Include session type in x-analytics header

Will probably have to be rethought once we start caching authenticated responses.

Change #1247034 merged by Fabfur:

[operations/puppet@production] varnish: add trusted_req and rl_class fields to x-analytics

https://gerrit.wikimedia.org/r/1247034

The patch has been merged and will be propagated shortly to all cache hosts. This means that the headers X-WMF-Ratelimit-Class and X-Trusted-Request will soon be present in X-Analytics

With T417780 deployed, you should be seeing data come in now. @Fabfur can you confirm?

Hi @daniel I can confirm I see the x_is_browser, trusted_req and rl_class fields in webrequest (text), good job!

Hi @daniel I can confirm I see the x_is_browser, trusted_req and rl_class fields in webrequest (text), good job!

Excellent! I'll close this ticket, then.