https://wikitech.wikimedia.org/wiki/Event_Platform/SPIKE/Should_we_enable_compression_on_kafka_jumbo summarizes the learning of this spike.
In T344688: Increase Max Message Size in Kafka Jumbo we increased the max records size to 10MB. This was required to support applications that produce large payloads (e.g. MW event enrichment).
Since our events are stored as JSON, we could benefit from message compression.
This SPIKE wants to inform the following:
- What are key metrics on topics that store large messages (total size, percentile message size, throughput, retention)?
- Should we also enable compression on jumbo?
- Should we rather let producers take care of it?
The latter might be nice to save some bandwidth. Happy to move this to a dedicated phab if needed.
MirrorMaker
I see we have snappy compression enabled for MirrorMaker producers https://github.com/wikimedia/operations-puppet/blob/9bcf0640550b2eae76d144af64288649c1000799/modules/confluent/manifests/kafka/mirror/instance.pp#L89. How would this work in practice when an application writes directly into one of the two DCs?
Compression at topic level
While compression could be set at topic level, the fact that topic configs are not kept under version control made enough folks feel uneasy enough to avoid this path whenever possible.