In {T382173}, we added a `producers.eventgate.enrich_fields_from_http_headers` stream config setting. This setting instructs eventgate to enrich event data with HTTP request headers before producing the event.
This will be useful especially in cases where the header values are not available to the client (e.g. set by the server side CDN).
However, in cases where the data is available to the client, there are several reasons and advantages to configurably set the data in the event on the client side instead of server side (eventgate).
1. eventgate is agnostic to the semantics of the events it produces. It is not 'wiki aware'. It requests global stream config from https://meta.wikimedia.org/w/api.php?action=streamconfigs. If there are per wiki settings (via per-wiki overrides in mediawiki-config), those settings will only be available from the wiki's api endpoint, e.g https://en.wikipedia.org/w/api.php?action=streamconfigs. MediaWiki clients have this per-wiki configuration automatically available to them.
2. The desired data, e.g. the client's user-agent, might not always be in the headers for the POST request to eventgate. When MediaWiki PHP POSTs the event, it makes an HTTP POST request to eventgate that is distinct from the original user client that made an HTTP request to MediaWiki. E.g. The Growth team's [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/GrowthExperiments/+/refs/heads/master/includes/EventLogging/SpecialHomepageLogger.php#120 | HomepageVisit instrumentation is sent from MW PHP ]] after a user visits the MW homepage. To work around this, EventLogging is manually setting the event's `http.request_headers['user-agent']` field to the [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventLogging/+/c13e4c3ebef6719669544c312f5acbdc53e8a7df/includes/EventSubmitter/EventBusEventSubmitter.php#83 | current MW HTTP request's 'User-Agent' header ]]. This is a bit awkward, because MW is acting as a proxy for the real client (the user's browser that made the original HTTP request). Which request is `http.request_headers` meant to represent? As is, it might contain headers from multiple requests, but there would be no way to understand which ones were from which? Does this matter?
We should add client specific configuration (to EventStreamConfig or elsewhere ([[ https://wikitech.wikimedia.org/wiki/Metrics_Platform/MPIC | MPIC ]] contextual attributes?) that allows configuration of clients to set specific event fields.
Ideally this would be user-agent agnostic, and instead control setting headers in fields, like the EventGate configuration. If this was done in EventStreamConfig, perhaps a `producers.mediawiki_client.enrich_fields_from_http_headers` setting?
This was also [[ https://wikimedia.slack.com/archives/C05ERLBF0E7/p1737990045128109 | discussed in Slack ]].
== Done is
[] MediaWiki instrumentation event producing clients (JS, MW PHP, etc.) can configurably set user-agent (and other data?) in event data before POSTing to EventGate.