Page MenuHomePhabricator

mediawiki-event-enrichment taskmanager crashes at startup
Closed, ResolvedPublic

Description

Processing page_change with v1.23.0 is resulting in deserialization errors in the eqiad/codfw deployment that are preventing the application from starting.

The same deployment started up fine in staging, but no data was processed (kafka topics are empty) and the issue was not detected.

This was the cause of the HA restore issue reported in https://phabricator.wikimedia.org/T340059#8988977.

Workaround: rolling back to v1.21.0 fixed the issue.

Error
message
Failed to deserialize consumer record due to","error.stack_trace":"java.io.IOException: Failed to deserialize consumer record due to\n\tat org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:56)\n\tat org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:33)\n\tat org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:144)\n\tat org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:417)\n\tat org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)\n\tat org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)\n\tat org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)\n\tat org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)\n\tat org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)\n\tat org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)\n\tat org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)\n\tat org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)\n\tat org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)\n\tat org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator\n\tat org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:92)\n\tat org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
Impact
  • critical, application won't start.
Notes
  • rolling back to v1.21.0 fixed the issue

Details

TitleReferenceAuthorSource BranchDest Branch
eventutilities: version bumprepos/data-engineering/mediawiki-event-enrichment!66gmodenaT341096-version-bump-imagemain
Add schema version param to all flink source methodsrepos/data-engineering/eventutilities-python!77ottoflink_source_schema_versionmain
Customize query in GitLab

Event Timeline

Change 935700 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] mw-page-content-change-enrich revert docker image.

https://gerrit.wikimedia.org/r/935700

Change 935700 merged by jenkins-bot:

[operations/deployment-charts@master] mw-page-content-change-enrich revert docker image.

https://gerrit.wikimedia.org/r/935700

@gmodena and I debugged this today, and realized it was because we never implemented support for specifying the schema versions used by the Kafka source, and always use the latest. This conflicts with the RowTypeInfo we use when reading from the source into a DataStream.

To fix: kafkaSouceBuilder in wikimedia-event-utilities java should support specifying a schema version, and eventutilities-python should pass the schema version in.

Change 935814 had a related patch set uploaded (by Ottomata; author: Ottomata):

[wikimedia-event-utilities@master] Add EventDataStreamFactory#kafkaSourceBuilder with schemaVersion param

https://gerrit.wikimedia.org/r/935814

Change 935814 merged by jenkins-bot:

[wikimedia-event-utilities@master] Add schemaVersion param to EventDataStreamFactory source builder methods

https://gerrit.wikimedia.org/r/935814