Page MenuHomePhabricator

Complete edit quality campaign for Hungarian Wikipedia
Closed, ResolvedPublic

Description

Stats: http://labels.wmflabs.org/stats/huwiki/
Contact: @Tgr

  • Announce the labeling campaign on huwiki
  • Status update #1
  • Status update #2

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Halfak triaged this task as Medium priority.Jun 15 2017, 2:21 PM
Halfak added a subscriber: grin.

@grin, this is the next step for Hungarian Wikipedia. Can you help us out with this?

This is now seeing steady progress (30% in the last couple weeks), thanks to @Misibacsi. Feedback from the local discussion:

  • it would be quicker if Wikilabels displayed the name of the user who made the edit (I imagine this is hidden intentionally but mentioning just in case)
  • a fairly large part of the edits to be labeled are uninteresting (bot edits, edits in project namespace, bot edits in project namespace), skipping those and using a smaller training set might be a good trade-off.

Bot edits should not be included in the dataset. Is it possible that some bots that are not flagged as bots are showing up?

WRT showing the name, we've avoided that on purpose because there's measurable biases against anonymous editors. We've kind of felt that was was best left as WONT_FIX. What do you think?

Bot edits should not be included in the dataset. Is it possible that some bots that are not flagged as bots are showing up?

No, they are proper bots. Example edit (task id 317779), user rights log.

WRT showing the name, we've avoided that on purpose because there's measurable biases against anonymous editors. We've kind of felt that was was best left as WONT_FIX. What do you think?

Yeah, I guessed as much. I don't have a strong feeling either way - labelers will probably look up nontrivial edits anyway because it's a lot of effort to verify an unsourced fact change so it's easier to rely on the author's reputation or reaction of the reviewers.

I've found the problem! huwiki is one of the datasets where we mixed edits that seem to "need review" with those that don't so that we can check our assumption. See the line in our makefile here: https://github.com/wiki-ai/editquality/blob/master/Makefile#L2136

More recently, we've discarded this strategy as it seems clear that our "needs review" filters are working as intended.

Option 1: Continue as-is and get labels for *some* of the edits that do not "need review"
Option 2: Pull all unlabeled edits that do not "need review" and work with the remaining observations.

Given how close the campaign is to finishing, I think we should continue with option #1. But, I'd be OK with option #2 if someone felt strongly.

26 revisions not found out of 26, twice in a row. I think it’s done, and the remaining revisions should be removed manually from the database (or replaced by accessible ones).

Confirmed that this is done! Thanks for your work. I'll get us moving on the next step.

Halfak claimed this task.