Page MenuHomePhabricator

Eventstreams 'assignments' logstash field type
Closed, ResolvedPublic

Description

logstash / opensearch has been complaining about type mismatch for field assignments. Investigating further it does indeed look like eventstreams changes the field type depending on the operation.

For example this has been indexed id 6h_n1pUBuXzFNByTSDF_ (logstash)

assignments: eqiad.mediawiki.recentchange, codfw.mediawiki.recentchange
msg: Bulding assignments from passed in assignments

While for msg"=>"Final resolved Kafka assignments" (logstash dlq) the assignments field is an object:

"assignments"=>[{"topic"=>"eqiad.mediawiki.recentchange", "offset"=>-1, "partition"=>0}, {"topic"=>"codfw.mediawiki.recentchange", "offset"=>-1, "partition"=>0}],

cc @Ottomata

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Bump kafka-sse and conform logs to ecsrepos/data-engineering/eventstreams!25tchinkafkasse-0.5.3master
Nest assignment log under labelrepos/data-engineering/kafkasse!5tchinassignment-labelmaster
Bump KafkaSSE to use Winstonrepos/data-engineering/eventstreams!21tchinbump-kafka-ssemaster
Switch from bunyan to winstonrepos/data-engineering/kafkasse!2tchinuse-winstonmaster
Customize query in GitLab

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2025-03-27T09:29:34Z] <godog> silence LogstashKafkaConsumerLag and LogstashIndexingFailures for today for 1d - T390140

Change #1131672 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] logstash: stringify 'assignments' from eventstreams

https://gerrit.wikimedia.org/r/1131672

Curious! I wonder if this changed recently as @tchin replaced service-runner with service-utils and (started?) producing ECS logs.

Either way, makes sense. Hm. We could just log assignments as an object in the first case too, setting topic but not the other details.

Or, perhaps better: don't set assignments in log data for the first message. It can be read out of the message string just fine.

Ottomata renamed this task from Eventstreams 'assignments' field type to Eventstreams 'assignments' logstash field type.Mar 27 2025, 1:11 PM

Ah, the issue is in KafkaSEE, which still assumes bunyan. I'll have to standardize the logging inside that lib

Or a real quick way is to not pass in the new logger from eventstreams to KafkaSSE, that way it creates a bunyan logger and the logs won't appear as an error anymore in logstash

Hm, this message doesn't look like it should be an error? Pretty normal operation?

Standardizing the logger in KafkaSSE too sounds better :)

not pass in the new logger from eventstreams to KafkaSSE

Would this mean that logs emitted from KafkaSSE would not associated with eventstreams service? Or would the fact that they come from the same service container be enough?

Whatever you decide is best will be fine! TY!

Change #1131672 merged by Cwhite:

[operations/puppet@production] logstash: stringify 'assignments' from eventstreams

https://gerrit.wikimedia.org/r/1131672

We've rolled out a logstash filter to check for name KafkaSSE and to cast the assignments field into a string. This can be undone when it is no longer needed.

Change #1167638 had a related patch set uploaded (by TChin; author: TChin):

[operations/deployment-charts@master] [eventstreams] Bump version 0.16.0

https://gerrit.wikimedia.org/r/1167638

Change #1167638 merged by jenkins-bot:

[operations/deployment-charts@master] [eventstreams] Bump version 0.16.0

https://gerrit.wikimedia.org/r/1167638

Change #1171588 had a related patch set uploaded (by TChin; author: TChin):

[operations/deployment-charts@master] [eventstreams] Bump version 0.17.0

https://gerrit.wikimedia.org/r/1171588

Change #1171588 merged by jenkins-bot:

[operations/deployment-charts@master] [eventstreams] Bump version 0.17.0

https://gerrit.wikimedia.org/r/1171588

assignments is now stringified in KafkaSSE, but in the logs I see that assignments is in normalized.dropped.no_such_field. Is there something I'm missing? @colewhite

assignments is now stringified in KafkaSSE, but in the logs I see that assignments is in normalized.dropped.no_such_field. Is there something I'm missing? @colewhite

When this task was filed, eventstreams used the legacy logstash format. Now that eventstreams is writing ECS, the field is reaped because the field does not exist in the schema.

Within ECS, we can either rely on event.original, amend the schema, or use another field.

another field: https://doc.wikimedia.org/ecs/#field-labels ?

label.assignments: <json assignments string> ?

assignments is now stringified in KafkaSSE, but in the logs I see that assignments is in normalized.dropped.no_such_field. Is there something I'm missing? @colewhite

When this task was filed, eventstreams used the legacy logstash format. Now that eventstreams is writing ECS, the field is reaped because the field does not exist in the schema.

Within ECS, we can either rely on event.original, amend the schema, or use another field.

Was this a recent change? In the ecs docs it says:

If your events have additional data that cannot be mapped to ECS, you can simply add them to your events, using custom field names.

Was this a recent change? In the ecs docs it says:

If your events have additional data that cannot be mapped to ECS, you can simply add them to your events, using custom field names.

Wow, good catch! That may be true for upstream, but it is not true for our usage. c.f. wikitech for more info.

I've updated the docs to reflect this. Thanks!

Change #1178550 had a related patch set uploaded (by TChin; author: TChin):

[operations/deployment-charts@master] [eventstreams] Bump version 0.18.0

https://gerrit.wikimedia.org/r/1178550

Change #1178550 merged by jenkins-bot:

[operations/deployment-charts@master] [eventstreams] Bump version 0.18.0

https://gerrit.wikimedia.org/r/1178550