Page MenuHomePhabricator

Include information about useful model thresholds in ORES
Closed, ResolvedPublic

Description

Models will change. This means that the appropriate thresholds for a given task will change. Our users should have a nice means to identifying useful scoring thresholds. E.g. the threshold at which 90% of damaging edits will be caught (for RC patrolling) and the threshold at which the false-positive rate falls below 10% (for anti-vandal bot).

Right now, revscoring provides a flexible set of test_statistics that can capture these thresholds. We need to standardize on how we'll provide them through the ORES API.

Event Timeline

Halfak claimed this task.

So, just to be sure, what is the meaning of this?

"damaging": {
    ...
    "recall_at_fpr(max_fpr=0.1)": {
      "fpr": 0.0,
      "recall": 0.004,
      "threshold": 1.0
    },
...
"goodfaith": {
    ...
    "recall_at_fpr(max_fpr=0.1)": {
      "fpr": 0.062,
      "recall": 1.0,
      "threshold": 0.0
    },

https://ores.wmflabs.org/v2/scores/ptwiki/

Does it mean that we need to consider everything as "non-damaging" if we want to avoid more than 10% false positives? And similarly, we need to treat everything as good-faith unless we accept more than 10% false positives?