Page MenuHomePhabricator

Train/test article quality model for euwiki
Closed, ResolvedPublic

Description

  1. List Item

How do Wikipedians label articles by their quality level?
We don't have currently a system, but we would use something similar to the enwiki

What levels are there and what processes do they follow when labeling articles for quality?
We only label featured articles, and we do it by votation.

How do InfoBoxes work? Are they used like on English Wikipedia?
We are mostly working with Wikidata in order to provide automated templates. Most of articles that can have an infobox (biographies, places, films...) do have one and are working with Wikidata.

Are there "citation needed" templates? How do they work?
We use {{Erref_behar}} and articles are added to categories, as in English Wikipedia.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Halfak triaged this task as Medium priority.Jul 20 2017, 2:44 PM
Halfak moved this task from Unsorted to Research & analysis on the Machine-Learning-Team board.

Hello! Any news on this? We will start soon with the

Thanks!

Halfak renamed this task from Build article quality model for euwiki to Train/test article quality model for euwiki.May 20 2018, 10:07 AM
Halfak added a subscriber: Halfak.

Working from the labeled data, I see:

44 B
119 C
 39 FA
 51 GA
 39 null
 49 Start
 57 Stub

So we have a dominance of "C" class, but the other classes are relatively well balanced around 50 observations.

@Theklan, do you use citation templates in Basque wiki? In enwiki, they use {{cite}} or {{cite books}}, etc.

@Halfak we use one only reference template for everything: {{erreferentzia}}. We are also using citoid to generate automatic citations.

We also have some templates for problematic articles: {{wikitu}} for articles that are not wikified, {{erreferentzia falta}} for articles without references and {{zuzendu}} for articles that are not well written.

Note that most nule articles are also lists. Lists are not correctly formatted sometimes but all articles that finish with "zerrenda" are lists.

Thank you for the notes. I'll get these incorporated into the feature extraction.

Hi @Halfak! Can you give me an update on this? Thanks!

Sorry for the delay. I'm snowed under with end-of-fiscal year stuff at the WMF. We're looking to actually get this model deployed soon. I'd have already gotten it deployed if it weren't for the bureaucratic stuff that's filling my plate.

I think that a deployment early next week (June 11 or 12th) is likely.

Model was merged here: https://github.com/wiki-ai/articlequality/pull/67

Deployment is still on schedule for some time this week.

Change 439573 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/services/ores/deploy@master] Enable euwiki wp10 model

https://gerrit.wikimedia.org/r/439573

I just put together https://gerrit.wikimedia.org/r/#/c/mediawiki/services/ores/deploy/+/439656 I'm not sure why it isn't showing up here because it is tagged with this "bug" ID

This covers @awight's change, addresses my note and handles srwiki and bswiki too

awight claimed this task.

Change 439573 abandoned by Awight:
Enable euwiki wp10 model

https://gerrit.wikimedia.org/r/439573

:)

Great!

So, now that we have this system... which would be the possibilities to have articles automatically scored in their talk page? Maybe we can run a bot or something, but I would like to know which are the options.

Thanks!

Also, is it possible to integrate this results into the Outreach Dashboard? I'm not seing results there if I choose the ORES button.

Thanks for the ping. @Theklan, I've filed an issue for it: https://github.com/WikiEducationFoundation/WikiEduDashboard/issues/1896

Integration should definitely be possibly, but it may be a while before we can get to it.

Ping Global-Collaboration, I thought you might want to know that this exists, even if we don't have a way to integrate yet.

Thanks for the ping. @Theklan, I've filed an issue for it: https://github.com/WikiEducationFoundation/WikiEduDashboard/issues/1896

Integration should definitely be possibly, but it may be a while before we can get to it.

Thanks @Ragesoss. We are not in a hurry, but it would be great to have it for the next term, starting in september.

Ping Global-Collaboration, I thought you might want to know that this exists, even if we don't have a way to integrate yet.

Can you explain this, please?

Ping Global-Collaboration, I thought you might want to know that this exists, even if we don't have a way to integrate yet.

Can you explain this, please?

Hi! My thought here was that we'll have the wp10 scores available in MediaWiki soon: T192268: Enable ores extension wp10 storage in English Wikipedia, and there are interesting features that could be written to take advantage of that data. The potential rewards increase as new wikis are supported by these models, of course.

Ok, so my guess is that is not possible yet to bulk-extract this information by now and upload it with a bot in the talk pages... isn't it?

(sorry, I'm quite lost in this area)

Ok, so my guess is that is not possible yet to bulk-extract this information by now and upload it with a bot in the talk pages... isn't it?

(sorry, I'm quite lost in this area)

Yes, it would be possible now that the model is available on ores.wikimedia.org. I don't have suggestions about whether that's a good idea or not, but technically it won't be difficult.

The MediaWiki database integration is more specific to MediaWiki extensions, which are able to do things like filter RecentChanges by ORES thresholds. The distinction is a bit subtle, but one way to think about it is that most features supported directly by the Wikimedia Foundation will need the tighter integration, but for so-called third party tools, it shouldn't make any difference.