Maniphest T205320

Updated ORES models can no longer satisfy configured threshold requirements
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Catrope
	Sep 24 2018, 6:34 PM

Description

For example, the fiwiki goodfaith model is now really bad, and no longer has available thresholds that satisfy precision >= 0.15. This has caused some filters to disappear from Recentchanges completely, and others to become useless. The only reason we noticed is that Special:ORESModels throws notices when encountering this situation (see T205228).

Based on the error log entries produced by T205228, the following models are affected at minimum:

fiwiki goodfaith (stats)
hewiki goodfaith (stats)
fawiki damaging (stats)
ruwiki goodfaith (stats)

Really we should reevaluate the thresholds of all models, we've never yet done that after the initial configuration of each model.

Related Objects
Search...

Status	Assigned	Task
Open	None	T205320 Updated ORES models can no longer satisfy configured threshold requirements
Resolved	Ladsgroup	T215358 Create new editquality labeling campaign for ruwiki
Resolved	Zache	T215359 Finish editquality labeling campaign for fiwiki
Resolved	Halfak	T215363 Create new editquality labeling campaign for hewiki

Event Timeline

Catrope created this task.Sep 24 2018, 6:34 PM

Restricted Application added a project: Machine-Learning-Team. · View Herald TranscriptSep 24 2018, 6:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Adding to @Catrope, we could add a step to the Makefile after writing model stats which would look for obvious shortcomings in the models built and do some type of intervention. I'm not sure it should be a prompt however, because we run this as an ad-hoc batch process and want it to complete without blocking on any one model. For reference, I think we currently crash the entire build on errors anywhere in the make pipeline, so the bar is set pretty low.

Just taking a look at this again. I can explain where the problem may have come from with fiwiki (using flaggedrevs as observations), but the others have surprised me. It could be that by re-tuning and re-training we can get a more reasonable split. It's really in that case that I'm seeing a *serious* problem.

Generally, it seems likely that we'll continue to sometimes be able to satisfy strict statistics and struggle at other times. This is due to non-deterministic effects in model training. In reality, the model will be a bit better than the statistics suggest. Our statistics will get more and more exact as we add new observations to training and testing. This is a big reason why we want to get Jade out. It will be a huge source of data beyond the limited Wikilabels campaigns we run now.

That said, for some of these communities, we're still working with data from 2015/2016 so running a new labeling campaign to get more data wouldn't be out of the question.

Halfak triaged this task as Medium priority.Feb 5 2019, 10:28 PM

Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.

awight unsubscribed.Mar 21 2019, 4:04 PM

Halfak closed subtask T215358: Create new editquality labeling campaign for ruwiki as Resolved.Apr 4 2019, 9:45 PM

Ladsgroup closed subtask T215359: Finish editquality labeling campaign for fiwiki as Resolved.Apr 8 2019, 6:31 PM

Ladsgroup closed subtask T215363: Create new editquality labeling campaign for hewiki as Resolved.Apr 17 2019, 6:27 PM

• ACraze moved this task from Maintenance/cleanup to Backlog/ORES on the Machine-Learning-Team board.Jan 19 2021, 9:29 PM

Restricted Application added a subscriber: Huji. · View Herald TranscriptJan 19 2021, 9:29 PM

Updated ORES models can no longer satisfy configured threshold requirementsOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Updated ORES models can no longer satisfy configured threshold requirements
Open, MediumPublic
Actions

Related Objects
Search...