Page MenuHomePhabricator

Purge ORES scores from Hadoop and begin backfill when model version changes
Open, LowestPublic

Description

When a model is updated, existing scores are deprecated. New scores should be backfilled using the updated model.

I can't imagine how to update scores in-place, so I think we have to purge all scores for the old model. Somehow don't break any research queries running on that table, or have some kind of social protocol for downtime.

Event Timeline

Indeed in hadoop there is no such thing as 'in place'. The way to go could be to use model version as a partition-key. You'd backfill a new version, keeping the old ones (and therefore not breaking queries on old ones). Then the decision is on how many versions to keep, and whether to keep them update or not.

Ladsgroup triaged this task as Medium priority.Nov 28 2018, 6:35 AM
Ladsgroup raised the priority of this task from Medium to Needs Triage.
Ladsgroup moved this task from Unsorted to New development on the Machine-Learning-Team board.
Harej triaged this task as Lowest priority.Mar 26 2019, 9:18 PM