Page MenuHomePhabricator

Automatically adjust ORES threshold settings when ORES models are updated
Closed, ResolvedPublic

Description

@Halfak mentioned recently that the ORES filter thresholds may need to be adjusted every time the ORES models are updated. That would apply both to the filters on Recent Changes (and other pages in future) and to the standard filter ranges that we're building into ReviewStream. These would presumably need to be adjusted individually for each wiki on which we implement these systems.

Clearly, if we understand this correctly, it's not something we want to have to manually manage. We need to look into this, see if it is indeed an issue, and devise a system.

Event Timeline

jmatazzoni created this task.
jmatazzoni updated the task description. (Show Details)
Halfak added a comment.Dec 3 2016, 8:41 PM

Here's an example of how to get evaluation statistics for a model:
https://ores.wikimedia.org/v2/scores/enwiki/damaging/?model_info=test_stats

{
  "scores": {
    "enwiki": {
      "damaging": {
        "info": {
          "test_stats": {
            "accuracy": 0.892,
            "filter_rate_at_recall(min_recall=0.75)": {
              "filter_rate": 0.869,
              "recall": 0.752,
              "threshold": 0.492
            },
            "filter_rate_at_recall(min_recall=0.9)": {
              "filter_rate": 0.753,
              "recall": 0.902,
              "threshold": 0.173
            },
            "precision": 0.227,
            "precision_recall": {
              "auc": 0.464
            },
            "recall": 0.745,
            "recall_at_fpr(max_fpr=0.1)": {
              "fpr": 0.0,
              "recall": 0.072,
              "threshold": 0.959
            },
            "roc": {
              "auc": 0.908
            },
            "table": {
              "false": {
                "false": 3400,
                "true": 388
              },
              "true": {
                "false": 39,
                "true": 114
              }
            }
          }
        },
        "version": "0.1.2"
      }
    }
  }
}

Note that a version number is returned with every request. E.g. here's a request to score an edit: https://ores.wikimedia.org/v2/scores/enwiki/damaging/749355744

{
  "scores": {
    "enwiki": {
      "damaging": {
        "scores": {
          "749355744": {
            "prediction": true,
            "probability": {
              "false": 0.2941371620173937,
              "true": 0.7058628379826063
            }
          }
        },
        "version": "0.1.2"
      }
    }
  }
}

Any time that version number changes, thresholds should be updated. You can probably get away with only updating the thresholds when we change the minor version, but no guarantees.

Change 332825 had a related patch set uploaded (by Sbisson):
[WIP] Fetch thresholds from live stats for filters

https://gerrit.wikimedia.org/r/332825

@Halfak I've tried to represent the models (damaging, goodfaith) and the levels we filter on (likelygood, maybebad, ...) in terms on the new test_stats. You can see it here. It would be really helpful if you could take a quick look, a sanity check really, at how we're using the stats. I don't want to take too much of your time. If something looks wrong I'll rework asap. Thanks!

@Halfak We talked about using the v2 api to get the updated thresholds but I see that the api we use to get the list of models (https://ores.wikimedia.org/scores/enwiki/) also happen to contain all the test stats we need (min_precision: 0.15, 0.45, 0.9, 0.98).

I'm trying to close the loop on this and I'm wondering if I should update my code to use the v1 response format or if the rest of the extension is being updated to all work with v2. The URL is a config option and having the code manipulate it to use both v1 and v2 at the same time is quite unpleasant. Am I missing something?

Thanks!

It looks like both URLs[1][2] return the same test stats. Is it the case? If so, any reason why we should use the second URL (v2)?

[1] https://ores.wikimedia.org/scores/enwiki/?model_info=test_stats
[2] https://ores.wikimedia.org/v2/scores/enwiki/?model_info=test_stats

Halfak added a comment.Mar 3 2017, 4:06 PM

We have no intention to remove the v1 functionality, so there's no harm in using that if you'd prefer. Though, I'd suggest switching the URL structure to explicitly request the v1 interface (e.g. [1] == [2]). The nice thing about the v2 interface is that it always returns the same structured document whereas the v1 interface returns a few different document structures.

  1. https://ores.wikimedia.org/scores/enwiki/?model_info=test_stats
  2. https://ores.wikimedia.org/v1/scores/enwiki/?model_info=test_stats

Change 332825 merged by jenkins-bot:
[mediawiki/extensions/ORES] Fetch thresholds from live stats for filters

https://gerrit.wikimedia.org/r/332825

https://ores.wikimedia.org/v1/scores/enwiki/?model_info=test_stats may be used to view ORES version and thresholds.

QA recommendation: Resolve.

Quiddity removed a subscriber: Quiddity.Mar 9 2017, 2:16 AM
jmatazzoni closed this task as Resolved.Mar 9 2017, 2:26 AM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJul 3 2017, 5:50 PM