Add support for generating training/test/evaluation datasets for
revert risk models using airflow dags
- discuss and decide on output format
- store in /wmf/data/research/datasets/
- integrate with datahub for discoverability
- should this job be scheduled, e.g. monthly?