The webrequest analytics data set should include ratelimit information exposed by the REST gateway using the x-wmf-* headers (see T417780). In particular, the following headers will be exposed:
- x-wmf-ratelimit-class
x-wmf-user-idpostponed until we have clarity on privacy concerns
From this slack thread :
On the HAProxy / HaproxyKafka side this could be summarized (roughly) as:
- Read response headers (coming from API) and save it to a variable
- Edit the log format to include the new variable
- Remove the response header to avoid sending it to the client
- Edit HaproxyKafka (edit/recompile/repackage/distribute across all nodes) to correctly read the new log format (read the new fields and include them in the document sent to Kafka)
There is a concern about sending a user identifier in webrequest (user_id).
As of now no user identifying cookie flows in this dataset
It would be good to have a confirmation from the security team that this is ok.