Page MenuHomePhabricator

Backfill ORES Hadoop scores with historical data
Open, LowestPublic

Description

Run a maintenance script which backfills batches of old revisions until we have the complete set of scores in Hive. Hitting the /precache endpoint is simple and should work, but we might need to optimize to avoid the extra traffic going through changeprop.

We might want to implement a new URL parameter to prevent these scores from diluting the Redis or MediaWiki caches of recent revisions.

Event Timeline

awight created this task.Nov 16 2018, 11:39 PM
awight updated the task description. (Show Details)Nov 16 2018, 11:43 PM
fdans moved this task from Incoming to Radar on the Analytics board.Nov 19 2018, 5:16 PM
Ladsgroup triaged this task as Normal priority.Nov 28 2018, 6:35 AM
Ladsgroup moved this task from Untriaged to New development on the Scoring-platform-team board.
Ladsgroup raised the priority of this task from Normal to Needs Triage.

Important change of plans—We're discussing backfilling, and it might be best to allow mismatched model versions in the dumps for now. In other words, go ahead and backfill with whatever the current model version is. Normalized data will continue to be segregated by model version, but the monthly "current" and "historical" dumps will patch together whatever scores are available, simply taking the newest model version used to score each revision.

Ladsgroup added a subscriber: Ladsgroup.

Since there's no assignee here, I'll move it to backlog, feel free to fix.

awight removed a subscriber: awight.Mar 21 2019, 4:04 PM
Harej triaged this task as Lowest priority.Tue, Mar 26, 9:19 PM