See T368753 for details.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | xcollazo | T358877 Dumps 2.0 Phase II: Production intermediate table milestone | |||
Open | None | T358373 [Dumps 2] Reconciliation mechanism to detect and fetch missing/mismatched revisions | |||
In Progress | xcollazo | T368753 Implement production mechanism that emits (wiki_db, revision_id) pairs for missing or inaccurate rows | |||
Resolved | xcollazo | T368754 Production PySpark job that can run consistency checks for wmf_dumps.wikitext_raw |
Event Timeline
Comment Actions
xcollazo updated https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/30
Draft: Job to emit reconciliation events
Comment Actions
xcollazo merged https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/30
Job to do consistency check