Page MenuHomePhabricator

Q3- Q4: Snapshots service is failing to decode some Kafka messages
Open, Needs TriagePublicBUG REPORT

Description

What happens?:

  • when snapshots DAG is running
  • we are getting errors when decoding messages (see logs in the notes sections)

What should have happened instead?:

  • all of the messages should be successfully decoded

Other information (browser name/version, screenshots, etc.):
Snapshots DAG logs:
[2023-11-21, 00:25:41 UTC] {snapshots.py:126} INFO - Finished an export job for project: zhwikivoyage in namespace: 10 with total: 53 and errors: 18
[2023-11-21, 00:34:02 UTC] {snapshots.py:126} INFO - Finished an export job for project: zhwiki in namespace: 10 with total: 21356 and errors: 4465
[2023-11-21, 04:57:16 UTC] {snapshots.py:126} INFO - Finished an export job for project: ckbwiki in namespace: 0 with total: 51991 and errors: 6246

Service logs example:
2023/11/21 00:25:40 export.go:225: avro unmarshal error for id: zhwikivoyage_namespace_10 with offset: 257 with error: Namespace: avro: decode union type: unknown union type
2023/11/21 00:32:59 export.go:225: avro unmarshal error for id: zhwiki_namespace_10 with offset: 191803 with error: URL: avro: ReadSTRING: invalid string length
2023/11/21 04:57:11 export.go:225: avro unmarshal error for id: ckbwiki_namespace_0 with offset: 88906 with error: WatchersCount: avro: ReadInt: int overflow
2023/11/21 04:57:11 export.go:225: avro unmarshal error for id: ckbwiki_namespace_0 with offset: 88929 with error: URL: avro: ReadSTRING: invalid string length

Event Timeline

Protsack.stephan renamed this task from Snapshots service failing to unmarshal Kafka messages to Snapshots service is failing to decode some Kafka messages .Nov 21 2023, 1:39 PM

This ticket will be closed,

After investigation we determined our state is not matching correctly the expected state, in that sense we will do a re-ingestion to clean it.

Could you explain a bit more what this means, please?

We have a store of articles in kafka we generate the snapshots from. That store is now corrupted.

We will refresh the store with an updated version of the data.

The work planned is added as subtasks of this current ticket

JArguello-WMF renamed this task from Snapshots service is failing to decode some Kafka messages to Q3- Q4: Snapshots service is failing to decode some Kafka messages .Wed, Apr 3, 4:26 PM
JArguello-WMF removed the point value for this task.