Page MenuHomePhabricator

Wikidata label quality model
Open, LowPublic

Description

It would be great to have a model that predicts the quality of an edit to a label/description on Wikidata.

Use-case:

  1. A user is using the mobile app to edit a label on Wikidata.
  2. User sees a "this looks like potential error or vandalism" can you try again?
  3. Upon second submission- "due to the nature of this answer, this will be reviewed before going live"

Event Timeline

This would require substantial new development. Not clear that this is a pressing issue, so it will be hard to prioritize research and modeling time. Not to say it isn't important, but an argument is necessary.

In the end, it might be better to just make our editquality-modeling model for Wikidata better. We can probably do some effective text processing on labels, descriptions and other places that strings appear.

We'll want to check on the performance of the current model for differentiating good/bad label edits by newcomers/anons (readers). I'm guessing that it doesn't work that well yet because we don't do any text processing of labels, so it would be hard to get any interesting signal.

Halfak triaged this task as Low priority.Dec 1 2016, 3:30 PM
  1. It is not AI approach, but it is quite effective - similarity of sitelinks to label is easy way to catch bad labels. Example quarry: https://quarry.wmflabs.org/query/15753 . Turning into AI approach it can be important feature. (similarity score of sitelink and label)
  2. Similarity may be useful feature also when comparing labels of similar entities (people who share same first name, or last name in one language, expected to share names in other languages usually [except nicknames])