Page MenuHomePhabricator

Edit quality campaign for Latvian Wikipedia
Closed, ResolvedPublic


  • Confirm translations are ready
  • List of trusted user groups
  • Translate "Edit quality (20k sample)"
  • Run prelabeling script
  • Load revisions into

Event Timeline

Halfak added a subscriber: Ladsgroup.

@Papuass, it looks like we don't have translations for lv yet in the labeling interface, so users will just see English.

If you want to get things translated, check out Specifically, we'll need translations for the basic interface and the damaging_and_goodfaith form.

We'll also need a list of "trusted user groups". These user groups are only given to users who are highly trusted. We'll filter their edits out of the labeling system so we don't waste people's time reviewing good work. For English Wikipedia, we use

  • sysop
  • oversight
  • bot
  • rollbacker
  • checkuser
  • abusefilter
  • bureaucrat

One more thing. Can you provide a lv translation of "Edit quality (20k sample)"? We'll use this as the title of the edit quality labeling campaign.

Halfak triaged this task as Medium priority.Apr 13 2017, 2:46 PM
Halfak moved this task from Unsorted to Blocked on community input on the Machine-Learning-Team board.

I will work on translations.

Our trusted users are:

  • sysop
  • bureaucrat
  • bot
  • oversight (0 users)
  • checkuser (0 users)
  • patroller
  • autopatrolled (Should we add these? This group is manually assigned for proven users.)

Edit quality (20k sample) = Labojumu kvalitāte (20k paraugs)

Interface translation completed

We should wait for half a week (probably untill Monday) until the translation gets added to the codebase. After that, I make the campaign and stuff.

(3.4)halfak@ores-compute-01:~/projects/editquality$ cat datasets/lvwiki.autolabeled_revisions.20k_2016.json | json2tsv autolabel.needs_review | sort | uniq -c 
  17337 False
   2654 True
(3.4)halfak@ores-compute-01:~/projects/editquality$ cat datasets/lvwiki.autolabeled_revisions.20k_2016.json | json2tsv reverted_for_damage | sort | uniq -c 
  19423 False
    568 True

Looks like we'll be labeling 2654 revisions which is a smaller set than usual (usually ~5k), but I think seeing 568 that look like they were reverted for damage (best guess) is pretty good, so let's move forward. If we end up not having enough observations of damage from this campaign, we can always do another.

halfak@wikilabels-01:~/datasets$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign lvwiki "Labojumu kvalitāte (20k paraugs)" damaging_and_goodfaith DiffToPrevious 1 50 --config /srv/wikilabels/config/config/
{'active': True, 'tasks_per_assignment': 50, 'created': datetime.datetime(2017, 4, 14, 16, 9, 37, 160265), 'name': 'Labojumu kvalitāte (20k paraugs)', 'wiki': 'lvwiki', 'view': 'DiffToPrevious', 'labels_per_task': 1, 'id': 56, 'form': 'damaging_and_goodfaith'}
halfak@wikilabels-01:~/datasets$ cat lvwiki.autolabeled_revisions.20k_2016.json | grep '"needs_review": true' | wc # | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts --config /srv/wikilabels/config/config/ 56
   2654   24591  300295
halfak@wikilabels-01:~/datasets$ cat lvwiki.autolabeled_revisions.20k_2016.json | grep '"needs_review": true' | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts --config /srv/wikilabels/config/config/ 56

The labeling interface is here:

Stats can be found here:

All looks good. I'll have translations deployed soon. For now, the UI seems to be falling back to English.