This task might be a parent task for many others related to the Flink application.
As defined in the Relative Trending design document, the first step of the process is to create a Flink application that will convert webrequest_frontend_text Kafka topic into webrequest.page_view topic.
This conversion is already defined and done in batch and this process should be as close a possible to it.
We don't want to repeat code, so, whenever is possible, this application will use the same UDFs and libraries used in batch. Investigation about how to share those functions may be required.
The input throughput is around 105k messages/second. Due to the high traffic and initial requirements, the application will stateless and won't implement any complex bot detection rules.
We also need to analyze the throughput of the destination topic to decide the number of partitions.
Task is done if:
- A new stream for webrequest.page_view is defined
- A new schema for webrequest.page_view is defined
- The Kafka topic is using the right number of partitions
- A Flink application is deployed in K8s, reading from webrequest_frontend_text and writing into webrequest.page_view
Tasks that are required but outside of the MVP:
- Monitoring and alerting of the Flink application
- Schema and application are productionized and running in v1
- TBD: Should we convert webrequest_frontend_text into a regular stream with its own schema?


