Page MenuHomePhabricator

EventStreams: kafka key should be serialized as a string
Closed, ResolvedPublic

Description

In the WMF EventStreams API, if a Kafka message has a key, it is included in the outgoing event at meta.key (not part of the original event schema).

However, the key is being serialized as a Javascript Buffer. E.g.

"key": {
   "type": "Buffer",
   "data": [
     123,
     34,
     119,
     105,
     107,
     105,
     95,
     105,
     100,
     34,
     58,
     34,
     99,
     111,
     109,
     109,
     111,
     110,
     115,
     119,
     105,
     107,
     105,
     34,
     44,
     34,
     112,
     97,
     103,
     101,
     95,
     105,
     100,
     34,
     58,
     49,
     48,
     49,
     51,
     49,
     51,
     53,
     50,
     54,
     125
   ]

I think the way to do this will be to properly deserialize the key in the first place here.

However, there is no guarantee that the key will be a JSON string...that is just a convention we use at WMF. We should add safeguards too, just in case.

An example of how this is exposed to external users of EventStreams (like Wikimedia Enterprise) is in T373644.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
kafka key should be properly serializedrepos/data-engineering/eventstreams!15ottoT373689_kafka_key_deserializationmaster
Customize query in GitLab

Event Timeline

Ottomata moved this task from Incoming (new tickets) to Backlog on the Data-Engineering board.

@tchin I kept answering some questions about eventstreams and was embarrassed we hadn't fixed this yet.

Submitted an MR.
https://gitlab.wikimedia.org/repos/data-engineering/eventstreams/-/merge_requests/15

Maybe we can get this rolled out with your work for T361769: Migrate and re-deploy eventstreams using service-utils :)

Ottomata triaged this task as Medium priority.Jan 22 2025, 4:08 PM