Recently we discussed whether Dumps 2.0 should do reconciliation directly from the Analytics replicas, or whether it should delegate this mechanism elsewhere by just emitting (wiki_db, revision_id) pairs.
In this spike we want to see how much context we would miss, if any, when trying to generate any revisions that are missing, or to rectify revisions that have bad metadata.
Context:
DDL for wikitext_raw_rc2: https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/blob/main/hql/create-mediawiki_build_dumps_from_events_merge_into.hql
page_change schema: https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/page/change/1.1.0.yaml
