We'll import the `mediawiki.revision-score` stream in two steps:
== Daily job to archive recent scores
The mediawiki.revision-score events will often include scores for several models. These scores are normalized into a minimal table `ores.revision_score`, and partitioned by wiki, model, and model_version so that they can be efficiently purged as new models are released.
There are streams coming from both the eqiad and codfw datacenters, and it's possible for scores to be redundant (even slightly different!) between the two, but mostly only in unusual situations like a switchover. In normal operation, we see 16.5M scores from eqiad and only 500 from codfw in 2019 so far. We're going to accept this small potential for duplication in the normalized scores table, but will select distinct rows when building the monthly snapshots in the next step.
== Monthly job to produce dump-ready snapshots
The normalized tables will be joined with `mediawiki_history` in order to produce monthly snapshots of the scores along with the context, wiki page, revision, and user metadata. These will only contain metadata available to an anonymous user, and are suitable for making public dump files.
---
Document the new tables on wikitech, e.g. https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly