Page MenuHomePhabricator

[Spec] Add score comparison check to deployment checklist
Open, LowPublic

Description

We should store the scores generated for a set of revisions with every model that gets committed to a model repo (e.g. editquality-modeling, articlequality-modeling, etc.). Using this dataset, we can:

  • Detect improvements/regressions in predictions over time
  • Give us an early warning when something's wrong with the model/environment
  • Demonstrate improvement on known poor predictions

Once we have T160224: Store docker images in a repo that replicate the train/test/deploy environment for models, we'll also be able to generate scores historically.

Gist of a plan:

  1. Figure out a workflow with the revscoring score utility that will store a set of scores in some sane way.
  2. Compare scores across model changes to get a sense for what types of score changes are OK.
  3. Write revscoring reflect <score-files> that will take a set of constraints for how the most recent scores can vary from the previous scores without causing an error. (Error can then be used as a deployment check and maybe a Travis check too)
  4. Load historic scores into a public database for auditing and analysis (probably way out of scope, but good to think about)

Event Timeline

Halfak created this task.Mar 11 2017, 1:15 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2017, 1:15 AM
Halfak updated the task description. (Show Details)Mar 11 2017, 1:23 AM
Halfak renamed this task from Add score comparison check to deployment checklist to [Spec] Add score comparison check to deployment checklist.Mar 16 2017, 2:24 PM
Halfak triaged this task as Low priority.Mar 16 2017, 2:37 PM
Halfak updated the task description. (Show Details)