Page MenuHomePhabricator

[SPIKE] Should we enable compression on kafka jumbo?
Closed, ResolvedPublic

Description

https://wikitech.wikimedia.org/wiki/Event_Platform/SPIKE/Should_we_enable_compression_on_kafka_jumbo summarizes the learning of this spike.

In T344688: Increase Max Message Size in Kafka Jumbo we increased the max records size to 10MB. This was required to support applications that produce large payloads (e.g. MW event enrichment).

Since our events are stored as JSON, we could benefit from message compression.

This SPIKE wants to inform the following:

  • What are key metrics on topics that store large messages (total size, percentile message size, throughput, retention)?
  • Should we also enable compression on jumbo?
  • Should we rather let producers take care of it?

The latter might be nice to save some bandwidth. Happy to move this to a dedicated phab if needed.

MirrorMaker

I see we have snappy compression enabled for MirrorMaker producers https://github.com/wikimedia/operations-puppet/blob/9bcf0640550b2eae76d144af64288649c1000799/modules/confluent/manifests/kafka/mirror/instance.pp#L89. How would this work in practice when an application writes directly into one of the two DCs?

Compression at topic level

While compression could be set at topic level, the fact that topic configs are not kept under version control made enough folks feel uneasy enough to avoid this path whenever possible.

Event Timeline

I'd strongly suggest to use snappy compression on the producer side, it saves a ton of network bandwidth and it doesn't add any extra load to the broker (but it is spread among producers and consumers). We use snappy everywhere, a notable example is Varnishkafka (big batches, a ton of traffic per second, etc..).

I'd strongly suggest to use snappy compression on the producer side, it saves a ton of network bandwidth and it doesn't add any extra load to the broker (but it is spread among producers and consumers). We use snappy everywhere, a notable example is Varnishkafka (big batches, a ton of traffic per second, etc..).

Ack. I see that also eventgate defaults to snappy compression https://github.com/wikimedia/operations-deployment-charts/blob/70a93ac772120617979ad3773adcfa983b52f767/charts/eventgate/values.yaml#L147.

Is there any reason for not enabling compression by default cluster wise? @JAllemandou mentioned an issue with a compressed topic that caused an outage a while back. I could not find an incident report.

Anyway, snappy sounds good to me and compression between producer and brokers will indeed save bandwidth.
We can enable it for python/flink producers.

Some rough numbers on 7 days of mediawiki.page_content_change.v1. The topic has a size in the neighborhood of 762GiB. Messages have a median size of 380KB and a max of 1MB.

Quartile breakdown:

stat                                         (bytes)
mean                                        4.140364e+05            
std                                         1.493413e+05            
min                                         1.370000e+01            
25%                                         3.171210e+05            
50%                                         3.879230e+05            
75%                                         4.627110e+05            
max                                         1.032399e+06

For comparison, mediawiki.page_change.v1 in the neighborhood of 20-24GiB and significantly smaller records:

stat                                               (bytes)
mean                                        11830.387352    
std                                          2519.479806    
min                                          5576.000000    
25%                                         10159.000000    
50%                                         11955.000000    
75%                                         13253.000000    
max                                         20642.000000

Page changes is produced from eventgate, so I assume the payload is snappy compressed.

One thing worth noticing is that, afaics, mediawiki.page_content_change.v1 is a topic with only one partition, so it would be better to spread the load to multiple brokers. Given the amount of brokers in jumbo I'd suggest 3/6 partitions.

One thing worth noticing is that, afaics, mediawiki.page_content_change.v1 is a topic with only one partition, so it would be better to spread the load to multiple brokers. Given the amount of brokers in jumbo I'd suggest 3/6 partitions.

@elukey you are right. We have some topic partitioning work that is scheduled post release T338231: [Event Platform] mw-page-content-change-enrich should (re)produce kafka keys, so this would def be in scope.
I don't have the rights to alter topics configs though (https://wikitech.wikimedia.org/wiki/Kafka/Administration#Alter_topic_partitions_number). Is it something you could help me with (when the time comes)?

cc @pfischer since we recently discussed about flink and kafka partitioning practices.