Default Kafka partitioning ensures that messages with the same key will always go to the same Kafka partition. Consumers should be able to consume messages in order for a given page.
We should produce page content change events to Kafka with the wiki_id and page_id in the message key. Producing events with this key will allow us to one day maintain a backfilled compacted Kafka topic with current page state (T324108).
For enrichment jobs, we could consider automating this. If a Kafka message has a key, we could default to Flink partitioning and Kafka sink key-ing using the incoming source key. To do this, we'd need to be able to use a Key and Value Kafka message deserializer via java EventDataStreamFactory, rather than just a Value only deserializer. This could be done as part of this task, or perhaps as another one. Either way:
Done is
- mw-page-content-change-enrich produces Kafka messages with wiki_id, page_id in Kafka message key.
- Kafka producer uses key hash partitioning (should be the default? we should double check).