More than 2500 articles are corrupted (unparseable) in zhwiki_namespace_10 snapshot. We need to understand the reason for this in order to attempt a fix for this.
Refer to `Investigations/Investigation: Unparseable zhwiki articles in snapshots` for details.
**To do**
[x] Diagnose whether the snapshot process after getting the json article works ok. -> No problem here
[x] Diagnose basic avro -> Golang struct instance -> json process works ok. -> No problem here
[x] Get a list of corrupted articles in several snapshots of zhwiki. Corrupted articles are the ones we cannot read as a json.
[x] Try to find these articles in the kafka topic. See if they are corrupted.
[] Run bulk ingetion in dev and check if those articles from the list in the dev topic are now fixed.
[] Create snapshot in dev for zhwiki_namespace_10 and try to get the list of corrupted articles (not readable as json).
**Acceptance criteria**
[] Root cause identified
[] Hypothesis of fix verified