Page MenuHomePhabricator

Complete labeling campaign for euwiki
Closed, ResolvedPublic

Description

Describe a quality scale (0-5)

How do "infoboxes" work?

Do you use a citation-needed template?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Most infoboxes are automatically loaded from Wikidata. Those that are automatic have the pattern like this:
https://eu.wikipedia.org/wiki/Txantiloi:Biografia_infotaula_automatikoa

All the automatic templates can be found here: https://eu.wikipedia.org/wiki/Wikiproiektu:Txantiloien_automatizazioa

"Citation-needed" template is called "Erref behar"

It looks like most of the articles are between 1 and 4k long. We can get a pretty good subset by sampling between 50 page blocks https://quarry.wmflabs.org/query/27100

We can generate a useful stratified sample with this query: https://quarry.wmflabs.org/query/27101

I had folks from the Basque Wikipedia double check that it seemed to capture a wide range of quality levels. We're going to work with the wp10 scale and go from there. We have translations available.

One thing noted was that there was a big amount of articles about municipalities in France (it makes sense, we have lots of them) so we had lots of C class articles with exactly the same structure. Maybe this will affect the machine learning.

Aha! Thanks for the note about that. I think you're right that we'll see the machine learning model focus on that structure. But time will tell and we can always adjust and retrain :)

CommunityTechBot renamed this task from pncaaaaaaa to Complete labeling campaign for euwiki.Jul 1 2018, 5:48 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Ladsgroup claimed this task.