Page MenuHomePhabricator

Complete labeling campaign for euwiki
Closed, ResolvedPublic

Description

Describe a quality scale (0-5)

How do "infoboxes" work?

Do you use a citation-needed template?

Event Timeline

Halfak created this task.May 19 2018, 6:26 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptMay 19 2018, 6:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Most infoboxes are automatically loaded from Wikidata. Those that are automatic have the pattern like this:
https://eu.wikipedia.org/wiki/Txantiloi:Biografia_infotaula_automatikoa

All the automatic templates can be found here: https://eu.wikipedia.org/wiki/Wikiproiektu:Txantiloien_automatizazioa

"Citation-needed" template is called "Erref behar"

It looks like most of the articles are between 1 and 4k long. We can get a pretty good subset by sampling between 50 page blocks https://quarry.wmflabs.org/query/27100

We can generate a useful stratified sample with this query: https://quarry.wmflabs.org/query/27101

I had folks from the Basque Wikipedia double check that it seemed to capture a wide range of quality levels. We're going to work with the wp10 scale and go from there. We have translations available.

http://labels.wmflabs.org/ui/euwiki/ It's online! And labeling has begun.

Labeling done!

One thing noted was that there was a big amount of articles about municipalities in France (it makes sense, we have lots of them) so we had lots of C class articles with exactly the same structure. Maybe this will affect the machine learning.

Aha! Thanks for the note about that. I think you're right that we'll see the machine learning model focus on that structure. But time will tell and we can always adjust and retrain :)

Vvjjkkii renamed this task from Complete labeling campaign for euwiki to pncaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from pncaaaaaaa to Complete labeling campaign for euwiki.Jul 1 2018, 5:48 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Ladsgroup closed this task as Resolved.Nov 1 2018, 7:59 PM
Ladsgroup claimed this task.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptNov 1 2018, 7:59 PM