Page MenuHomePhabricator

[Investigate] Wikidata revert model's precision and recall (filter rate)
Closed, ResolvedPublic

Description

Question: What proportion of human-edits will need to be reviewed if we want 95% recall?

Methods:

  1. Gather random sample of edits
  2. Label reverted
  3. Explore dataset of non-reverted damage

Event Timeline

Halfak created this task.Dec 31 2015, 1:39 AM
Halfak updated the task description. (Show Details)
Halfak raised the priority of this task from to Needs Triage.
Halfak moved this task to Backlog on the Scoring-platform-team (Current) board.
Halfak added a subscriber: Halfak.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 31 2015, 1:39 AM

Started some work here. This is based off of a random sample of wikidata edits

https://etherpad.wikimedia.org/p/revscoring_wikidata_reverted_set

OK. If we draw the cutoff at 0.93, we'll catch 100/10000 edits and that will account for (as far as we can tell) all of the vandalism!

Halfak assigned this task to Ladsgroup.Jan 1 2016, 6:18 PM
Halfak set Security to None.
Halfak moved this task from Backlog to Done on the Scoring-platform-team (Current) board.
Halfak closed this task as Resolved.Jan 21 2016, 3:43 PM