The existent webrequest streams are not technically 'Event Platform' streams. Making them one would allow us to use tooling we are developing as part of {T308356} to consume webrequest from Kafka using Flink, or any other event platform tooling. This would be nice for {T310997}.
It would also make webrequest Kafka topics automatically documented in datahub.
To make webrequest supported by Event Platform we need the following:
[] An event schema declared that matches the webrequest fields: [[ https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/983898 | patch ]]
[] The following fields added to webrequest's output format:
- `$schema`
- `meta.stream` (this can be just set to 'webrequest')
-- Possibly also: `meta.dt`, `meta.request_id`, `meta.id`, etc. To be discussed.
[] A 'webrequest' stream declared in event stream config, with its composite topics explicitly declared: [[ https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/983905 | patch ]]
[] Gobblin hadoop ingestion uses eventstreamconfig to ingest the topics, instead of topic wildcard: [[ https://gerrit.wikimedia.org/r/c/analytics/refinery/+/983926 | patch ]]
Once these are done, we should be able to treat webrequest like any other event stream.
---
We are planning to do this as part of {T351117}, and in doing so create a new event platform webrequest stream, allowing us to eventually decommission the old one.