It looks like this was inherited from EventLogging for compatibility, but it'd be great if we could phase this out sooner rather than later to avoid technical debt from accumulating that would need additional work and migration in the future.
In the EventLogging system, we were actually very close to removing it. We used a hashed IP for any legacy use cases and for compat, but to my knowledge all uses of have been long migrated by now.
For example, for events that need to be associated by country, instrumentions generally send the Geo.country value directly as part of the schema. (Schema example, Instrumentation JS example).
In fact, while I didn't realise this until now, it seems we actually did remove it from the EL infra in 2016:
- T128408: Clean up Client IP and hashing related code on Eventlogging {oryx} {mole},
- T118595#2534744
- T126366
- etherpad notes
I guess it slipped back in at some point to facillitate compatibility for derivative fields that in the new system are generated in the eventgate-wikimedia layer, and I guess it was just natural/easier to pass the varnishkafka intake onto the stream consumers unchanged.
Would be great to phase this out again and document easy ways to achieve any related use cases as they come up.
Outcome criteria
For the "default" output of EventGate streams as seen in Kafka (such as those consumed by Logstash for client-errors, and by webperf/navtiming in the future), to not contain any http.client_ip fields.