Page MenuHomePhabricator

Make presence and targets of ORES filters configurable
Closed, ResolvedPublic

Description

Joe and I discussed how we should handle different ORES model characteristics on different wikis and we settled on the following:

  • Make configurable which of "May have problems", "Likely have problems" and "Very likely have problems" are available (i.e. allow a config that only enabled 2 of these 3), and the same for goodfaith
    • This implies creating "Very likely bad faith", which doesn't currently exist
    • We probably want to standardize on maybebad/lilkelybad/verylikelybad rather than maybebad/bad for internal names
  • Allow the threshold configuration for each of these filters to be set to either a numerical threshold value (e.g. 0.123) or a property name from the ORES model_stats API (e.g. 'recall_at_precision(min_precision=0.99)' or a simplified version that gets expanded to that)
  • Set the defaults to:
    • verylikelygood: recall_at_precision(min_precision=0.995)
    • maybebad: filter_rate_at_recall(min_recall=0.9)
      • Ideally this'd be max(filter_rate_at_recall(min_recall=0.9), recall_at_precision(min_precision=0.15)) (for damaging; min() for goodfaith), but that's getting a bit complex so we might want to hold off on implementing that at first
    • likelybad: recall_at_precision(min_precision=0.6)
    • verylikelybad: recall_at_precision(min_precision=0.9)

Event Timeline

@Catrope, here is the generalized filter language. One new filter was added: Very likely bad faith. With that exception, only the description texts changed, not the filter names.

Contribution quality predictions


**Very likely good**
Highly accurate at finding almost all problem-free edits.
**May have problems**
Finds most flawed or damaging edits but with lower accuracy.
**Likely have problems**
With medium accuracy, finds more problem edits than the “Very Likely” filter but fewer than “May.”
**Very likely have problems**
Highly accurate at finding the most obvious flawed or damaging edits.

==User intent predictions

Very likely good faith
Highly accurate at finding almost all good-faith edits.
May be bad faith
Finds most bad-faith edits but with lower accuracy.
Likely bad faith
With medium accuracy, finds more bad-faith edits than the “Very likely” filter but fewer than “May.”
Very likely bad faith
Highly accurate at finding the most obvious bad faith edits.

Change 348496 had a related patch set uploaded (by Sbisson):
[mediawiki/extensions/ORES@master] Make filters thresholds more configurable

https://gerrit.wikimedia.org/r/348496

Change 348496 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Make filters thresholds more configurable

https://gerrit.wikimedia.org/r/348496

Change 349108 had a related patch set uploaded (by Catrope):
[operations/mediawiki-config@master] Add b/c for ORES config format change

https://gerrit.wikimedia.org/r/349108

Change 349146 had a related patch set uploaded (by Catrope):
[operations/mediawiki-config@master] Set ORES thresholds in new format for all enabled wikis

https://gerrit.wikimedia.org/r/349146

Change 349108 merged by jenkins-bot:
[operations/mediawiki-config@master] Add b/c for ORES config format change

https://gerrit.wikimedia.org/r/349108

Change 349146 merged by jenkins-bot:
[operations/mediawiki-config@master] Set ORES thresholds in new format for all enabled wikis

https://gerrit.wikimedia.org/r/349146

Mentioned in SAL (#wikimedia-operations) [2017-04-27T23:22:47Z] <catrope@naos> Synchronized wmf-config/InitialiseSettings.php: Set ORES thresholds in new format for all enabled wikis (T162760) (duration: 00m 53s)

(1) @SBisson

May be bad faith
Finds most bad-faith edits but with lower accuracy.

Presently ((1.30.0-wmf.1 and betalabs), there is an extra 'a'

Screen Shot 2017-05-17 at 12.06.39 PM.png (528×712 px, 98 KB)

(2) The current levels set for damaging( goodfaith) filters can be viewed with var_dump($stats->getThresholds('damaging')) .

QA Recommendation: Resolve

@SBisson disregard my previous comment re wording of filters' description. I confirmed with @jmatazzoni to add 'a' . So the wording should be as following:

Contribution quality predictions
May have problems
Finds most flawed or damaging edits but with a lower accuracy.

User intent predictions
May be bad faith
Finds most bad-faith edits but with a lower accuracy.