Page MenuHomePhabricator

Deploy ORES filters for jawiki
Open, Needs TriagePublic

Description

ORES Filters for Japanese Wikipedia are scheduled to be deployed Monday, June 17th. Please add filters to Special:RecentChanges.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kostajh added a subscriber: kostajh.

Tentatively scheduling for Q2; if you need it sooner please let us know.

@Catrope have a look at the stats for those models... Have you seen anything like that before?

If we exclude the thresholds with precision or recall at the edge, there's almost nothing left. We could maybe come up with 2 levels (likelygood + maybebad) but even then, I'm not sure how they can be configured so they have a little overlap.

vagrant@vagrant:/vagrant/mediawiki$ mwscript extensions/ORES/maintenance/ConfigureThresholds.php -m damaging -t jawiki
Configuring damaging on jawiki

MIN	MAX	CONFIG	PREC	REC	FILTER
0	0.97	P 0.15	0.989	1	
0	0.97	P 0.45	0.989	1	
0	0.97	P 0.6	0.989	1	
0	0.97	P 0.75	0.989	1	
0	0.97	P 0.9	0.989	1	
0	0.97	P 0.98	0.989	1	
0	0.662	P 0.99	0.99	0.997
0	0.173	P 0.995	0.995	0.844	likelygood (default)
0	0.218	R 0.9	0.994	0.902	
0.433	1	P 0.15	0.155	0.181
0.893	1	P 0.45	0.502	0.007	
	1	P 0.6			likelybad (default)
	1	P 0.75			
	1	P 0.9			verylikelybad (default)
	1	P 0.98			
	1	P 0.99			
	1	P 0.995			
0.042	1	R 0.9	0.025	0.906	maybebad (default)

vagrant@vagrant:/vagrant/mediawiki$ mwscript extensions/ORES/maintenance/ConfigureThresholds.php -m goodfaith -t jawiki
Configuring goodfaith on jawiki

MIN	MAX	CONFIG	PREC	REC	FILTER
0	0.994	P 0.15	0.996	1	
0	0.994	P 0.45	0.996	1	
0	0.994	P 0.6	0.996	1	likelybad (default)
0	0.994	P 0.75	0.996	1	
0	0.994	P 0.9	0.996	1	
0	0.994	P 0.98	0.996	1	
0	0.994	P 0.99	0.996	1	
0	0.994	P 0.995	0.996	1	
0	0.987	R 0.9	0.998	0.906	maybebad (default)
0.995	1	P 0.15	1	0.157	
0.995	1	P 0.45	1	0.157	
0.995	1	P 0.6	1	0.157	
0.995	1	P 0.75	1	0.157	
0.995	1	P 0.9	1	0.157	
0.995	1	P 0.98	1	0.157	
0.995	1	P 0.99	1	0.157	
0.995	1	P 0.995	1	0.157	likelygood (default)
0.902	1	R 0.9	0.008	0.9

Have you seen anything like that before?

I have, and in the past it's been tracked down to extremely skewed inputs from the labeling campaign. I think the first version of the arwiki goodfaith model was built from a data set that had <1% of the edits labeled as bad faith.

The damaging model is pretty bad, but borderline workable: we could set maybebad to P=0.15 instead of R=0.9 (which the rules say we should do here anyway: of R=0.9 and P=0.15, take the one with the narrower score range), and that wouldn't overlap with the default choice for likelygood (overlap between likelygood and maybebad is allowed, but not required). The goodfaith model is useless, and we shouldn't enable it. In the past, I have enabled only the damaging model and sent the goodfaith model back to the ORES team for a re-do.

In this case, we should definitely send the goodfaith model back to the ORES team, for them to try and make a better one. I personally think that while we're doing that, we might as well send the damaging model back too. It's not useless, and we could squeeze a little bit of utility out of it, but if they're already going to try to fix the goodfaith model I think they should take a look at the damaging model as well.

Interesting! We struggled to get good performance out of the reverted models for jawiki too. It seems like we're either missing something really important in feature extractor or damage in jawiki is just very subtle. We'll need a local collaborator to make progress here. We have been blocked on that in the past.

See T230953: Why is jawiki's goodfaith model so bad? for our followup task.

SBisson moved this task from In Progress to Incoming on the Growth-Team (Current Sprint) board.
SBisson added a subscriber: SBisson.

Unassigning myself and moving back to incoming since this is not actionable by the Growth team for the moment.

I think we could call this "done" since the damaging filters -- while minimal -- are actually deployed. I think we'll create a followup task once we're ready with jawiki fixes -- which could take a long time.

MMiller_WMF added a subscriber: MMiller_WMF.

Moving off sprint board in favor of Newcomer Tasks V1.0 tasks.

Hey folks. I just went to go do some work on this and it looks like the "damaging" filters were never deployed despite them being determined useful. This is a blocker for me getting feedback on them from real users. Could we get them deployed?

@Halfak -- thanks for bringing this back up, and I'm sorry it got stuck. We'll discuss again as a team after the holidays.