Formalize fallback rules for automatically determining ORES threshold levels
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	awight
	Jan 14 2019, 11:37 PM

Description

We have some standards for how we define ORES thresholds, assuming the models perform ideally. When models aren't so good, which might be the case for new languages, or may even be due to a regression in an existing model, we need a strategy for how to degrade the threshold levels used on-wiki. @Catrope currently uses https://jsfiddle.net/catrope/50n1ekgu/ to generate https://docs.google.com/spreadsheets/d/1c8NMubBO0AS5KOFt5og0gz3dJK0IjEB3FwYOWEOXQ2g/edit?usp=sharing , then applies savage eyeball metrics to decide whether models are satisfactory, and how to fudge the levels if they are not. For example, we might only have one level "likely damaging" and no others, since the granularity isn't good enough to warrant extra levels.

It would be nice to formalize these rules so that we can one day automate the process or at least pass it on to additional humans.

Event Timeline

awight created this task.Jan 14 2019, 11:37 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 14 2019, 11:37 PM

awight added a project: MediaWiki-extensions-ORES.Jan 14 2019, 11:37 PM

The defaults are as follows, but I eyeball the results and adjust where needed. For example, if the 90% precision level has a low recall (lower than 8% or so), I'll choose the 80% or 75% precision level instead. One formalism I do use: "maybe bad" is 15% precision or 90% recall, whichever of those two produces a narrower threshold range.

Defaults:

maybebad: 15% precision
likelybad: 60% precision
verylikelybad: 90% precision
likelygood: 99.5% precision

Halfak moved this task from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.Feb 5 2019, 10:27 PM

Liuxinyu970226 added a project: Growth-Team.Feb 7 2019, 3:12 PM

The jsfiddle broke, because it made lots of requests in parallel and triggered 429s. Here's the new one that does work: https://jsfiddle.net/catrope/7hfg3drv/

awight unsubscribed.Mar 21 2019, 4:05 PM

Maintenance_bot moved this task from Backlog/Lift Wing to Backlog/ORES on the Machine-Learning-Team board.Jan 19 2021, 11:29 PM

MBinder_WMF added a project: Growth-Team-Filtering.Apr 15 2021, 6:51 PM

Formalize fallback rules for automatically determining ORES threshold levelsOpen, Needs TriagePublicActions

Description

Event Timeline

Formalize fallback rules for automatically determining ORES threshold levels
Open, Needs TriagePublic
Actions