On T346278, we implemented a basic Airflow job that will trigger the XML dumps for simplewiki.
In this task, we should expand that work to:
- Use dynamic task mapping to generate dump jobs for all currently open, public wikis (example implementation).
- There should be two runs: one on the 1st of the month for the 'full' XML dumps (two jobs: all revisions and current revisions), and another around the 15th for the 'partial' run (only current revisions).
Out of scope:
Moving output from HDFS to the servers from where the dumps are distributed. We will figure that out separately.