Since making the new MW Content Tables production quality, we have marked the old snapshot based tables deprecated:
| table | deprecation date |
| wmf.mediawiki_wikitext_history | 2025-01-29 |
| wmf.mediawiki_wikitext_current | 2025-04-30 |
A cursory search shows that the history table is only being referenced by our own DPE code.
For the current table, our own code, plus one deprecated project references the table.
In this task we should:
- Figure out the ownership of generate_anchor_dictionary_spark.py code, and get a commitment to migrate code.
- Remove code that generates the content of both tables
- remove code that imports from dumps servers to HDFS at modules/profile/manifests/analytics/refinery/job/import_mediawiki_dumps.pp
- remove data purge code at modules/profile/manifests/analytics/refinery/job/data_purge.pp
- remove airflow jobs
- Remove tables
- Remove any remaining data from the raw imports at /mnt/hdfs/wmf/data/raw/mediawiki/dumps