So far, as part of T352650 we have concentrated mainly on the SQL/XML dumps due to their size and complexity.
Now that we have the framework in place to carry out dumps using Mediawiki using Airflow and Kubernetes, then publish the results, we can move on to the other types of dumps.
This ticket will track the work on migrating these types, with possible sub-tickets.
- Adds/changes dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1325
- Wikidata entity dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1413
- Categories RDF dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1408
- Other miscellaneous dumps
- Cirrus search dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1445
- Content Translation dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1362
- Media info dumps - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1413
- Page titles - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1337
- Media titles - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1438
- Short url mappings - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1462
