More than 2500 articles are corrupted (unparseable) in zhwiki_namespace_10 snapshot. We need to understand the reason for this in order to attempt a fix for this.
**To do**
[] Diagnose whether the snapshot process after getting the json article works ok.
[] Diagnose basic avro -> Golang struct instance -> json process works ok.
[] Get a list of corrupted articles in several snapshots of zhwiki. Corrupted articles are the ones we cannot read as a json.
[] Try to find these articles in the kafka topic. See if they are corrupted.
[] Run bulk ingetion in dev and check if those articles from the list in the dev topic are now fixed.
[] Create snapshot in dev for zhwiki_namespace_10 and try to get the list of corrupted articles (not readable as json).
**Acceptance criteria**
[] Root cause identified
[] Hypothesis of fix verified