Page MenuHomePhabricator

Deploy ORES filters for jawiki
Open, Needs TriagePublic

Description

ORES Filters for Japanese Wikipedia are scheduled to be deployed Monday, June 17th. Please add filters to Special:RecentChanges.

Event Timeline

Halfak created this task.Jun 11 2019, 7:33 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJun 11 2019, 7:33 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kostajh moved this task from Inbox to Q2 2019-20 on the Growth-Team board.Jul 16 2019, 9:43 PM
kostajh added a subscriber: kostajh.

Tentatively scheduling for Q2; if you need it sooner please let us know.

SBisson added a subscriber: Catrope.Aug 9 2019, 7:56 PM

@Catrope have a look at the stats for those models... Have you seen anything like that before?

If we exclude the thresholds with precision or recall at the edge, there's almost nothing left. We could maybe come up with 2 levels (likelygood + maybebad) but even then, I'm not sure how they can be configured so they have a little overlap.

vagrant@vagrant:/vagrant/mediawiki$ mwscript extensions/ORES/maintenance/ConfigureThresholds.php -m damaging -t jawiki
Configuring damaging on jawiki

MIN	MAX	CONFIG	PREC	REC	FILTER
0	0.97	P 0.15	0.989	1	
0	0.97	P 0.45	0.989	1	
0	0.97	P 0.6	0.989	1	
0	0.97	P 0.75	0.989	1	
0	0.97	P 0.9	0.989	1	
0	0.97	P 0.98	0.989	1	
0	0.662	P 0.99	0.99	0.997
0	0.173	P 0.995	0.995	0.844	likelygood (default)
0	0.218	R 0.9	0.994	0.902	
0.433	1	P 0.15	0.155	0.181
0.893	1	P 0.45	0.502	0.007	
	1	P 0.6			likelybad (default)
	1	P 0.75			
	1	P 0.9			verylikelybad (default)
	1	P 0.98			
	1	P 0.99			
	1	P 0.995			
0.042	1	R 0.9	0.025	0.906	maybebad (default)

vagrant@vagrant:/vagrant/mediawiki$ mwscript extensions/ORES/maintenance/ConfigureThresholds.php -m goodfaith -t jawiki
Configuring goodfaith on jawiki

MIN	MAX	CONFIG	PREC	REC	FILTER
0	0.994	P 0.15	0.996	1	
0	0.994	P 0.45	0.996	1	
0	0.994	P 0.6	0.996	1	likelybad (default)
0	0.994	P 0.75	0.996	1	
0	0.994	P 0.9	0.996	1	
0	0.994	P 0.98	0.996	1	
0	0.994	P 0.99	0.996	1	
0	0.994	P 0.995	0.996	1	
0	0.987	R 0.9	0.998	0.906	maybebad (default)
0.995	1	P 0.15	1	0.157	
0.995	1	P 0.45	1	0.157	
0.995	1	P 0.6	1	0.157	
0.995	1	P 0.75	1	0.157	
0.995	1	P 0.9	1	0.157	
0.995	1	P 0.98	1	0.157	
0.995	1	P 0.99	1	0.157	
0.995	1	P 0.995	1	0.157	likelygood (default)
0.902	1	R 0.9	0.008	0.9

Have you seen anything like that before?

I have, and in the past it's been tracked down to extremely skewed inputs from the labeling campaign. I think the first version of the arwiki goodfaith model was built from a data set that had <1% of the edits labeled as bad faith.

The damaging model is pretty bad, but borderline workable: we could set maybebad to P=0.15 instead of R=0.9 (which the rules say we should do here anyway: of R=0.9 and P=0.15, take the one with the narrower score range), and that wouldn't overlap with the default choice for likelygood (overlap between likelygood and maybebad is allowed, but not required). The goodfaith model is useless, and we shouldn't enable it. In the past, I have enabled only the damaging model and sent the goodfaith model back to the ORES team for a re-do.

In this case, we should definitely send the goodfaith model back to the ORES team, for them to try and make a better one. I personally think that while we're doing that, we might as well send the damaging model back too. It's not useless, and we could squeeze a little bit of utility out of it, but if they're already going to try to fix the goodfaith model I think they should take a look at the damaging model as well.

Interesting! We struggled to get good performance out of the reverted models for jawiki too. It seems like we're either missing something really important in feature extractor or damage in jawiki is just very subtle. We'll need a local collaborator to make progress here. We have been blocked on that in the past.

See T230953: Why is jawiki's goodfaith model so bad? for our followup task.

SBisson removed SBisson as the assignee of this task.Aug 21 2019, 8:04 PM
SBisson moved this task from In Progress to Incoming on the Growth-Team (Current Sprint) board.
SBisson added a subscriber: SBisson.

Unassigning myself and moving back to incoming since this is not actionable by the Growth team for the moment.

I think we could call this "done" since the damaging filters -- while minimal -- are actually deployed. I think we'll create a followup task once we're ready with jawiki fixes -- which could take a long time.

MMiller_WMF added a subscriber: MMiller_WMF.

Moving off sprint board in favor of Newcomer Tasks V1.0 tasks.