Describe a quality scale (0-5)
How do "infoboxes" work?
Do you use a citation-needed template?
Describe a quality scale (0-5)
How do "infoboxes" work?
Do you use a citation-needed template?
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T195604 Extend feature extractors for euwiki articlequality model | |||
Resolved | awight | T171119 Train/test article quality model for euwiki | |||
Resolved | Ladsgroup | T195106 Complete labeling campaign for euwiki |
Most infoboxes are automatically loaded from Wikidata. Those that are automatic have the pattern like this:
https://eu.wikipedia.org/wiki/Txantiloi:Biografia_infotaula_automatikoa
All the automatic templates can be found here: https://eu.wikipedia.org/wiki/Wikiproiektu:Txantiloien_automatizazioa
"Citation-needed" template is called "Erref behar"
It looks like most of the articles are between 1 and 4k long. We can get a pretty good subset by sampling between 50 page blocks https://quarry.wmflabs.org/query/27100
We can generate a useful stratified sample with this query: https://quarry.wmflabs.org/query/27101
I had folks from the Basque Wikipedia double check that it seemed to capture a wide range of quality levels. We're going to work with the wp10 scale and go from there. We have translations available.
One thing noted was that there was a big amount of articles about municipalities in France (it makes sense, we have lots of them) so we had lots of C class articles with exactly the same structure. Maybe this will affect the machine learning.
Aha! Thanks for the note about that. I think you're right that we'll see the machine learning model focus on that structure. But time will tell and we can always adjust and retrain :)
Featured articles category is this: https://eu.wikipedia.org/wiki/Kategoria:Artikulu_nabarmenduak
Good articles category is this: https://eu.wikipedia.org/wiki/Kategoria:Artikulu_onak