Joe and I discussed how we should handle different ORES model characteristics on different wikis and we settled on the following:
- Make configurable which of "May have problems", "Likely have problems" and "Very likely have problems" are available (i.e. allow a config that only enabled 2 of these 3), and the same for goodfaith
- This implies creating "Very likely bad faith", which doesn't currently exist
- We probably want to standardize on maybebad/lilkelybad/verylikelybad rather than maybebad/bad for internal names
- Allow the threshold configuration for each of these filters to be set to either a numerical threshold value (e.g. 0.123) or a property name from the ORES model_stats API (e.g. 'recall_at_precision(min_precision=0.99)' or a simplified version that gets expanded to that)
- Set the defaults to:
- verylikelygood: recall_at_precision(min_precision=0.995)
- maybebad: filter_rate_at_recall(min_recall=0.9)
- Ideally this'd be max(filter_rate_at_recall(min_recall=0.9), recall_at_precision(min_precision=0.15)) (for damaging; min() for goodfaith), but that's getting a bit complex so we might want to hold off on implementing that at first
- likelybad: recall_at_precision(min_precision=0.6)
- verylikelybad: recall_at_precision(min_precision=0.9)