On {T367570}, we PoCed a mechanism to emit `(wiki_db, revision_id)` pairs. In this task we should make this code production quality.
Output:
[x] a PySpark job that can run similar checks as in T367570, but that is parameterized properly. Done via T368754.
[x] this job should have a `sink_table` parameter. Final table name: `wmf_dumps.wikitext_inconsistent_rows`
[x] Think about and define the DDL of `wmf_dumps.wikitext_inconsistent_rows` so that it is also usable from the point of view of data quality metrics.
[] A separate job that reads from `wmf_dumps.wikitext_inconsistent_rows` and calls EventGate. Done via T368755.
[x] An Airflow job that orchestrates all of this. Core of work done via T368756.
[] Figure a performant way to read all data from revision table via Spark ( T372677 )
[] Add a new hourly Spark MERGE INTO job that consumes the `page_content_late_change` hive table. ( T368746 )