Page MenuHomePhabricator

Add more values to test_stats
Closed, ResolvedPublic

Description

To https://ores.wikimedia.org/scores/enwiki/damaging/?model_info=test_stats etc, please add:

  • recall_at_precision(min_precision=0.6)
  • recall_at_precision(min_precision=0.75)
  • recall_at_precision(min_precision=0.99)
  • recall_at_precision(min_precision=0.995)

Related Objects

StatusAssignedTask
DuplicateQgil
ResolvedQgil
ResolvedQgil
OpenNone
ResolvedJohan
ResolvedTrizek-WMF
Resolved jmatazzoni
ResolvedDannyH
ResolvedDannyH
Resolved jmatazzoni
Resolved jmatazzoni
Resolved jmatazzoni
ResolvedTrizek-WMF
Resolved jmatazzoni
Resolved jmatazzoni
Resolved jmatazzoni
ResolvedTrizek-WMF
ResolvedPginer-WMF
Resolved jmatazzoni
ResolvedCatrope
ResolvedPginer-WMF
Resolved jmatazzoni
OpenNone
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedTrizek-WMF
Resolved jmatazzoni
ResolvedCatrope
ResolvedCatrope
ResolvedSBisson

Event Timeline

Catrope created this task.Mar 29 2017, 9:55 PM

It looks like filter_rate is not the same as precision. Is there a way to find out what the precision is at the threshold produced by filter_rate_at_recall(min_recall=N)? Ditto for filter_rate_at_fpr(max_fpr=N).

Halfak added a comment.Apr 4 2017, 2:14 PM

OK. This can be done. Adding a precision_at_recall metric would be a way we could cludge this in.

"filter_rate_at_recall" is a process optimizing metric whereas "precision_at_recall" is not. The process doesn't care whether how often a true prediction is a real true (precision). It cares what proportion of recent_changes items do not need to be reviewed (filter_rate). Instead, it seems that "precision_at_recall" provides some important information to a user so that they can set expectations.

Just thinking about this now, it seems we should probably have a basic set of test statistics at any threshold that we're optimizing for. "precision" seems like useful information but doesn't get reported for all thresholds.

We should probably have all 4 of the following fields reported for each threshold statistic:

  • threshold
  • precision
  • recall
  • filter rate
Catrope updated the task description. (Show Details)Apr 4 2017, 9:52 PM

I've removed the recall asks for now because it doesn't look like we'll need them in the short term. That said, precision_at_recall() would still be quite helpful. Having all 4 fields reported for every statistic would also be tremendously helpful.

Catrope updated the task description. (Show Details)Apr 4 2017, 10:20 PM
Catrope closed this task as Resolved.Apr 8 2017, 12:48 AM
Catrope claimed this task.