Page MenuHomePhabricator

Implement agent.ua_string as contextual attribute
Closed, ResolvedPublic5 Estimated Story Points

Description

User-Agent string should be a contextual attribute that Experimentation Lab client libraries can populate when requested.

When we start managing contextual attributes via xLab rather than stream configs, it will be easy to turn UA collection on/off, without it being strongly coupled to the stream config.

Context

Some thoughts by @Ottomata:

In T382173: Enable Event Platform streams to opt out of collecting User-Agent data, we added a producers.eventgate.enrich_fields_from_http_headers stream config setting. This setting instructs eventgate to enrich event data with HTTP request headers before producing the event.

This will be useful especially in cases where the header values are not available to the client (e.g. set by the server side CDN).

However, in cases where the data is available to the client, there are several reasons and advantages to configurably set the data in the event on the client side instead of server side (eventgate).

  1. eventgate is agnostic to the semantics of the events it produces. It is not 'wiki aware'. It requests global stream config from https://meta.wikimedia.org/w/api.php?action=streamconfigs. If there are per wiki settings (via per-wiki overrides in mediawiki-config), those settings will only be available from the wiki's api endpoint, e.g https://en.wikipedia.org/w/api.php?action=streamconfigs. MediaWiki clients have this per-wiki configuration automatically available to them.
  2. The desired data, e.g. the client's user-agent, might not always be in the headers for the POST request to eventgate. When MediaWiki PHP POSTs the event, it makes an HTTP POST request to eventgate that is distinct from the original user client that made an HTTP request to MediaWiki. E.g. The Growth team's HomepageVisit instrumentation is sent from MW PHP after a user visits the MW homepage. To work around this, EventLogging is manually setting the event's http.request_headers['user-agent'] field to the current MW HTTP request's 'User-Agent' header. This is a bit awkward, because MW is acting as a proxy for the real client (the user's browser that made the original HTTP request). Which request is http.request_headers meant to represent? As is, it might contain headers from multiple requests, but there would be no way to understand which ones were from which? Does this matter?

We should add client specific configuration (to EventStreamConfig or elsewhere (MPIC contextual attributes?) that allows configuration of clients to set specific event fields.

Ideally this would be user-agent agnostic, and instead control setting headers in fields, like the EventGate configuration. If this was done in EventStreamConfig, perhaps a producers.mediawiki_client.enrich_fields_from_http_headers setting?

This was also discussed in Slack.

Notes

This would be a great first task for an engineer onboarding to the Experiment Platform team and learning the client library codebase and the greater system.

Acceptance criteria

// Only one possible source column currently, but we could add more.
val possibleSourceColumnNames = Seq("http.request_headers.`user-agent`", "agent.ua_string")

[ v] https://wikitech.wikimedia.org/wiki/Metrics_Platform/Contextual_attributes updated

  • agent_ua_string is a value that can be included in provide_values array when configuring a stream (no specific action needed here)
  • xLab has been updated to consider agent.ua_string as a contextual attribute (deployed)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ottomata renamed this task from Instruments should control setting or user-agent and other headers in event data to Instruments should control setting of user-agent and other headers in event data.Jan 30 2025, 2:46 PM
Ottomata updated the task description. (Show Details)
mpopov renamed this task from Instruments should control setting of user-agent and other headers in event data to Implement agent.ua_string as contextual attribute.Apr 11 2025, 4:12 PM
mpopov triaged this task as Low priority.
mpopov removed a project: Product-Analytics.
mpopov updated the task description. (Show Details)
mpopov added a project: good first task.

@Ottomata: I discussed this with @phuedx and updated the task description based on the outcome of our discussion.

Thank you for tagging this task with good first task for Wikimedia newcomers!

Newcomers often may not be aware of things that may seem obvious to seasoned contributors, so please take a moment to reflect on how this task might look to somebody who has never contributed to Wikimedia projects.

A good first task is a self-contained, non-controversial task with a clear approach. It should be well-described with pointers to help a completely new contributor, for example it should clearly pointed to the codebase URL and provide clear steps to help a contributor get setup for success. We've included some guidelines at https://phabricator.wikimedia.org/tag/good_first_task/ !

Thank you for helping us drive new contributions to our projects <3

JVanderhoop-WMF raised the priority of this task from Low to High.Aug 21 2025, 3:49 PM
JVanderhoop-WMF set the point value for this task to 5.
JVanderhoop-WMF moved this task from Incoming to Backlog on the Test Kitchen board.

@Sfaci: let me know if/when you need a code review on that MR, I'd be happy to do that

Thanks @mpopov!. I was precisely finishing preparing the MR for review just before the last meeting we had. Please, the MR is already waiting for review. It's not a draft anymore but CodeReviewBot doesn't update it

sfaci updated https://gitlab.wikimedia.org/repos/data-engineering/metrics-platform/-/merge_requests/97

Added agent.ua_string as a contextual attributes for JS and PHP client libraries

phuedx merged https://gitlab.wikimedia.org/repos/data-engineering/metrics-platform/-/merge_requests/97

Added agent.ua_string as a contextual attributes to JS and PHP client libraries

Change #1185083 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[mediawiki/extensions/EventLogging@master] lib: Update lib/metrics-platform to f1a18553

https://gerrit.wikimedia.org/r/1185083

Change #1186049 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[analytics/refinery/source@master] Added `agent.ua_string` as a possible source when parsing user agent

https://gerrit.wikimedia.org/r/1186049

Sfaci updated the task description. (Show Details)

Change #1185083 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] lib: Update lib/metrics-platform to f1a18553

https://gerrit.wikimedia.org/r/1185083

Change #1190213 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[mediawiki/vendor@master] Upgrading wikimedia/metrics-platform (4.2.2 => 4.2.3)

https://gerrit.wikimedia.org/r/1190213

Change #1190213 merged by jenkins-bot:

[mediawiki/vendor@master] Upgrading wikimedia/metrics-platform (4.2.2 => 4.2.3)

https://gerrit.wikimedia.org/r/1190213

Change #1186049 merged by jenkins-bot:

[analytics/refinery/source@master] Added `agent.ua_string` as a possible source when parsing user agent

https://gerrit.wikimedia.org/r/1186049

Change #1190667 had a related patch set uploaded (by Phuedx; author: Santiago Faci):

[mediawiki/extensions/EventLogging@wmf/1.45.0-wmf.19] lib: Update lib/metrics-platform to f1a18553

https://gerrit.wikimedia.org/r/1190667

Change #1190667 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@wmf/1.45.0-wmf.19] lib: Update lib/metrics-platform to f1a18553

https://gerrit.wikimedia.org/r/1190667

Mentioned in SAL (#wikimedia-operations) [2025-09-23T14:03:20Z] <lucaswerkmeister-wmde@deploy1003> Started scap sync-world: Backport for [[gerrit:1190667|lib: Update lib/metrics-platform to f1a18553 (T385180)]], [[gerrit:1190679|lib: Update metrics-platform to fc7678c10a1f (T401380)]], [[gerrit:1190647|ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851)]]

Mentioned in SAL (#wikimedia-operations) [2025-09-23T14:09:06Z] <lucaswerkmeister-wmde@deploy1003> phuedx, lucaswerkmeister-wmde: Backport for [[gerrit:1190667|lib: Update lib/metrics-platform to f1a18553 (T385180)]], [[gerrit:1190679|lib: Update metrics-platform to fc7678c10a1f (T401380)]], [[gerrit:1190647|ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-23T14:22:14Z] <lucaswerkmeister-wmde@deploy1003> Finished scap sync-world: Backport for [[gerrit:1190667|lib: Update lib/metrics-platform to f1a18553 (T385180)]], [[gerrit:1190679|lib: Update metrics-platform to fc7678c10a1f (T401380)]], [[gerrit:1190647|ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851)]] (duration: 18m 54s)

phuedx updated the task description. (Show Details)
phuedx updated the task description. (Show Details)

Change #1191393 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/deployment-charts@master] xLab: Deploying v1.0.5 release to staging

https://gerrit.wikimedia.org/r/1191393

Change #1191393 merged by jenkins-bot:

[operations/deployment-charts@master] xLab: Deploying v1.0.5 release to staging

https://gerrit.wikimedia.org/r/1191393

Change #1191449 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/deployment-charts@master] xLab: Deploying v1.0.5 release to production

https://gerrit.wikimedia.org/r/1191449

Change #1191449 merged by jenkins-bot:

[operations/deployment-charts@master] xLab: Deploying v1.0.5 release to production

https://gerrit.wikimedia.org/r/1191449

sorry Santi - this was a victim of a bulk edit action, thanks for fixing.