We need our //golden// data retriever codebase to be more modular and have better support for addition of new modules, which will be very helpful when we start doing ZRR calculation for "well-behaved searches" (see T150370 & T150901). Testing if a new script works and backfilling missing data is a huge pain right now.
After talking with Analytics, Chelsy and I decided to migrate our codebase to use their [[ https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater | Reportupdater ]] infrastructure as it seems to meet our needs. This will require the following steps:
- [ ] Rewrite as many EventLogging (EL) based scripts to be pure SQL
- [ ] Rewrite current pure-R scripts be shell scripts + R and use [[ https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater#Script_conventions | Reportupdater conventions ]]
- [ ] Update column names in current datasets
- [ ] Prepare dashboards for new formats/naming conventions
- [ ] Deploy dashboards after Reportupdater-based refactor of golden has completed at least one successful run