Page MenuHomePhabricator

EventGate should support producing keyed messages for Kafka partitioning
Closed, ResolvedPublic

Description

This will be needed to produce the page state change stream (T308017) in proper Kafka partitions keyed by wiki_db,page_id.

Kafka partitioning could be handled by modifying eventgate-wikimedia's Kafka produce function.

Specifying the key could be done either by

A. Augmenting the EventGate API to allow for providing the key directly
B. Using stream config to to set the fields in the value that should be used for the key.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

A question to answer: Do message keys need schemas? Probably yes...but this might be pretty annoying to accomplish.

Change 851129 had a related patch set uploaded (by Ottomata; author: Ottomata):

[eventgate-wikimedia@master] WIP - support producing keyed messages

https://gerrit.wikimedia.org/r/851129

B. Using stream config to to set the fields in the value that should be used for the key.

I think this is the better way to go about this. WIP in https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/851129/

@phuedx , want to check in with you about this, and see if you have any thoughts.

See commit message of https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/851129 for proposed semantics of a new top level stream config settings key_fields.

@phuedx , want to check in with you about this, and see if you have any thoughts.

See commit message of https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/851129 for proposed semantics of a new top level stream config settings key_fields.

Generally speaking, this sounds good to me. Are you open to bikeshedding on the name key_fields? As I understand it, key here is referring to the key passed to the partitioner. However, if I was glancing over the stream configs, I might also read key as important. Is there a name that we could use to make it clear that these fields are to be passed to the partitioner?

Are you open to bikeshedding on the name key_fields

So open.

make it clear that these fields are to be passed to the partitioner?

The default librdkafka partitioner will use the key for kafka partitioning, but it could be configured to do otherwise. Perhaps we should make it clear that this is about the kafka message key?

Let's see:

  • kafka_message_key_fields
  • kafka_key_fields

Which do you prefer?

Although I don't love putting 'kafka' in the name here, who knows maybe one day we won't be using kafka for this.

Maybe just message_key_fields? Or...primary_key_fields? Hm.

message_key_fields sounds good to me.

Change 851129 merged by Ottomata:

[eventgate-wikimedia@master] Support producing keyed messages

https://gerrit.wikimedia.org/r/851129

Change 861446 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/mediawiki-config@master] beta - set message_key_fields on stream rc0.mediawiki.page_change

https://gerrit.wikimedia.org/r/861446

Change 861446 merged by Ottomata:

[operations/mediawiki-config@master] beta - set message_key_fields on stream rc0.mediawiki.page_change

https://gerrit.wikimedia.org/r/861446

Change 861448 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/mediawiki-config@master] rc0.mediawiki.page_change stream - produce with keyed message

https://gerrit.wikimedia.org/r/861448

Change 861448 merged by jenkins-bot:

[operations/mediawiki-config@master] rc0.mediawiki.page_change stream - produce with keyed message

https://gerrit.wikimedia.org/r/861448

Change 861451 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] eventgate - bump version to get keyed message support

https://gerrit.wikimedia.org/r/861451

Change 861451 merged by Ottomata:

[operations/deployment-charts@master] eventgate - bump version to get keyed message support

https://gerrit.wikimedia.org/r/861451

Deployed.

I also merged stream config changes to configure message_key_fields for the rc0.mediawiki.page_change stream, and in beta, tested that keys were produced to consistent topic partitions.