Maniphest T122687

[Investigate] Wikidata revert model's precision and recall (filter rate)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Dec 31 2015, 1:39 AM

Tags

Referenced Files

None

Subscribers

Description

Question: What proportion of human-edits will need to be reviewed if we want 95% recall?

Methods:

Gather random sample of edits
Label reverted
Explore dataset of non-reverted damage

Event Timeline

Halfak created this task.Dec 31 2015, 1:39 AM

Halfak raised the priority of this task from to Needs Triage.

Halfak updated the task description. (Show Details)

Halfak added a project: Machine-Learning-Team (Active Tasks).

Halfak moved this task to Backlog on the Machine-Learning-Team (Active Tasks) board.

Halfak subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 31 2015, 1:39 AM

Started some work here. This is based off of a random sample of wikidata edits

https://etherpad.wikimedia.org/p/revscoring_wikidata_reverted_set

OK. If we draw the cutoff at 0.93, we'll catch 100/10000 edits and that will account for (as far as we can tell) all of the vandalism!

\o/

I made some analysis and posted it in https://www.wikidata.org/wiki/Wikidata_talk:ORES/Report_mistakes.

Halfak assigned this task to Ladsgroup.Jan 1 2016, 6:18 PM

Halfak set Security to None.

Halfak moved this task from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.

Omar_sansi subscribed.Jan 4 2016, 9:12 AM

Halfak closed this task as Resolved.Jan 21 2016, 3:43 PM

• Phabricator_maintenance added a project: User-Ladsgroup.Aug 12 2016, 8:09 PM