The topic mediawiki.content_history_reconcile.v1 in Kafka Jumbo has a single partition.
Monthly, the Airflow DAG mw_content_reconcile_mw_content_history_monthly emits a bunch of messages to the topic mediawiki.content_history_reconcile.v1
Because it sends a lot of messages in a short period of time, it triggers the alert MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag.
Nothing is wrong, the application eventually catch-up with pending messages and the alert is automatically fixed, but it might take hours to do so.
To speed up the process, we want to increase the number of partitions, as well as the number of parallel workers in the Flink application processing these messages.
Task is done if:
- eqiad.mediawiki.content_history_reconcile.v1 and codfw.mediawiki.content_history_reconcile.v1 have more partitions.
-
The Flink application can parallelize the work with the new partitions.