This task is done when we have a single-command ETL for converting a new XML dump into a query-able set of HIVE tables.
On the altiscale "Research" cluster.
See current work here: https://github.com/wikimedia-research/research-cluster
See old notes here: https://etherpad.wikimedia.org/p/research_cluster_loading
Full process: [XML Dump] --> [JSON files] --> [Hive Table] --> [Metadata Hive Table]
[XML Dump] --> [JSON files] is handled by dump2revdocs.py
[Hive Table] & [Metadata Hive Table] is handled by some HiveQL scripts