On T367570: Spike: Figure feasability to emit (wiki_db, revision_id) pairs, we found that we have some surprising data quality issues:
In this spike we should figure out the root causes of some of these issues.
Code that generated these figures at : https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/blob/main/notebooks/Can_we_emit_wiki_db__revision_id_pairs_for_reconciliation.ipynb?ref_type=heads
Temporary table with all these missing and or bad rows at xcollazo.missing_or_innaccurate_rows

