Per today's Flink K8s sync, we are requesting permission to use the kafka-main cluster to transport CirrusSearch updates for the Search Update Pipeline redesign.
Quoting @dcausse from the Search Update Pipeline design doc: *
Kafka will hold most of the content required to perform the updates, this means that we will create (at the output of the preparation) events that may be relatively large (up to 4Mb uncompressed). Data shows that the uncompressed payload has a size:
- average around 20kb
- p95 under 100kb
- p99 fluctuating between 1Mb and 4Mb
The kafka-main clusters are configured to accept a record batch size up to 4Mib and we do not plan to change this limit. Compression is likely to be snappy so we could expect a 2:1 compression ratio. There are concerns about using kafka this way but given the use-case of this pipeline (read everything and rarely filter) this seems legitimate. Another alternative would be to have access to a generic content store but there does not seem to be any consensus on this subject yet.
@dcausse , @pfischer et al...feel free to review this and add/change as needed before sending to Service Ops. - Reviewed and ready to send over.
*This content is currently private, but I don't think it holds any sensitive information. I'll ask for approval to make it public.
