Include information about useful model thresholds in ORES
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Mar 8 2016, 3:46 PM

Description

Models will change. This means that the appropriate thresholds for a given task will change. Our users should have a nice means to identifying useful scoring thresholds. E.g. the threshold at which 90% of damaging edits will be caught (for RC patrolling) and the threshold at which the false-positive rate falls below 10% (for anti-vandal bot).

Right now, revscoring provides a flexible set of test_statistics that can capture these thresholds. We need to standardize on how we'll provide them through the ORES API.

Event Timeline

Halfak created this task.Mar 8 2016, 3:46 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 8 2016, 3:46 PM

He7d3r subscribed.Mar 9 2016, 5:41 PM

This is now available as of https://github.com/wiki-ai/ores/pull/130

Halfak closed this task as Resolved.Mar 21 2016, 3:54 PM

Halfak claimed this task.

So, just to be sure, what is the meaning of this?

"damaging": {
    ...
    "recall_at_fpr(max_fpr=0.1)": {
      "fpr": 0.0,
      "recall": 0.004,
      "threshold": 1.0
    },
...
"goodfaith": {
    ...
    "recall_at_fpr(max_fpr=0.1)": {
      "fpr": 0.062,
      "recall": 1.0,
      "threshold": 0.0
    },

https://ores.wmflabs.org/v2/scores/ptwiki/

Does it mean that we need to consider everything as "non-damaging" if we want to avoid more than 10% false positives? And similarly, we need to treat everything as good-faith unless we accept more than 10% false positives?

Include information about useful model thresholds in ORESClosed, ResolvedPublicActions

Description

Event Timeline

Include information about useful model thresholds in ORES
Closed, ResolvedPublic
Actions