Study up how Sanitarium and other tools are working together to move data from the production wikis to labsdb. Work with Ops / DBAs to migrate that process to a maintainable clean one.
Once we do that, we can do these other things in other tasks:
- refactor the loading of labsdb to use the new sanitized data
- refactor our history reconstruction to use the sanitized data (ideally this can be done through Tungsten as it reads data from the mysql binlog or as it loads from its staging tables, so we have more real-time data than we can get with sqoop)
- refactor the dumps process
References:
T103011
https://wikitech.wikimedia.org/wiki/MariaDB/Sanitarium_and_Labsdbs
T138450
T143955
Also See:
Tungsten Mysql Replicator
labsdb auditor (great work by @yuvipanda that should help with a whitelist: https://github.com/wikimedia/operations-software-labsdb-auditor/). Yuvi, your thoughts on this are welcome here.