Page MenuHomePhabricator

Add HAproxy termination field to webrequest
Closed, ResolvedPublic

Description

Discussing with @Fabfur about T382571, we realized that the HAProxy termination field could be interesting for analysis in webrequest.
This task is about adding to:

  • Add the field to haproxykafka logging app
  • Add the field to webrequest table in hadoop
  • Add the field to webrequest_sample in druid + turnilo

Documentation on the HAProxy termination state field: https://wikitech.wikimedia.org/wiki/HAProxy/session_states

Event Timeline

Given that it's just 4 bytes more, I think we can add this (I would do after we complete the migration, given that is a change on how we manage this kind of data and in the first time it's better to stick as much as possible to a 1:1 migration...)

Change #1136845 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] cache: add termination status to haproxy log format

https://gerrit.wikimedia.org/r/1136845

Change #1136845 abandoned by Fabfur:

[operations/puppet@production] cache: add termination state to haproxy log format

Reason:

not needed, termination_state field was already present in log format

https://gerrit.wikimedia.org/r/1136845

@JAllemandou I've prepared this patch for haproxykafka to include the termination_state field to the list of fields sent to webrequest. What's needed on your side to have it indexed?

Thanks @Fabfur ! When the data flows in, we need a schema change and a job modification on our side to make it appear in the data.
Generating it will not break break anything on our end :)

Mentioned in SAL (#wikimedia-operations) [2025-04-29T09:45:18Z] <fabfur> uploading haproxykafka 0.3.7 to reprepro (T387454)

Mentioned in SAL (#wikimedia-operations) [2025-04-29T13:36:47Z] <fabfur> depooling cp1112 to test new haproxykafka version behavior (T387454)

Mentioned in SAL (#wikimedia-operations) [2025-04-29T13:44:45Z] <fabfur> [correcting] cp1112 has NOT been depooled (T387454)

Mentioned in SAL (#wikimedia-operations) [2025-04-29T13:55:50Z] <fabfur> upgrading haproxkafka on A:cp (T387454)

Mentioned in SAL (#wikimedia-operations) [2025-04-29T14:01:09Z] <fabfur> haproxykafka upgraded and restarted on A:cp (T387454)

@JAllemandou the change has been deployed in production, now all haproxykafka instances on cache hosts are sending the termination_state field too, let me know if we can help somehow with the schema change to have these tracked downstream (eg. pointing to the repository that contains the schema?)

Hi @Fabfur , sorry for the late answer :)
I'm adding a PR for you to validate. There are multiple places where the code needs to be adapted.

Change #1140566 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Add HAProxy termination_state to webrequest

https://gerrit.wikimedia.org/r/1140566

Change #1140566 merged by Joal:

[analytics/refinery@master] Add HAProxy termination_state to webrequest

https://gerrit.wikimedia.org/r/1140566

Change #1142593 had a related patch set uploaded (by Joal; author: Joal):

[operations/puppet@production] Add termination_state field to turnilo webrequest

https://gerrit.wikimedia.org/r/1142593

Change #1142593 merged by Brouberol:

[operations/puppet@production] Add termination_state field to turnilo webrequest

https://gerrit.wikimedia.org/r/1142593