Page MenuHomePhabricator

Build article quality model for Galician Wikipedia
Closed, ResolvedPublic


How do Wikipedians label articles by their quality level?
What levels are there and what processes do they follow when labeling articles for quality?
In general, Galician Wikipedia articles aren't labeled by quality. There are a few wikiprojects that used labels in past, but they are inactive now.

How do InfoBoxes work? Are they used like on English Wikipedia?
There are a lot of InfoBoxes in Galician Wikipedia, they are at

Are there "citation needed" templates? How do they work?
The principal template citation needed in Galician wikipedia is {{cómpre referencia}},, but there are others like,,,,, and

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Theklan added a subscriber: Theklan.

A system similar to the one developed in euwiki (wp10) would be great for Galician. Also the same gadget @Halfak is doing there is interesting in Galician Wikipedia.

Ahh yes. It looks like we'll need to sample and generate a labeled dataset like we did for euwiki.

I've repurposed the euwiki query for glwiki here:

We'll want to get this sample loaded into wiki labels.

@Elisardojm, for euwiki, we set up basic translations of English Wikipedia's quality scale. Do you think that would work for glwiki? See

@Halfak, at we have [ FA], [ GA] articles and stubs, and I think that we could translate the other elements of the quality scale. Where can I do it?

I think the best place to do that would be to simply describe a labeling scale in a similar way to what you see at @Theklan, do you have such a documentation page describing the scale on euwiki?

I can get the GA and FA articles for model-training with this query:

I'll set up a labeling campaign for the stratified sample:

Once those are labeled on the quality scale, then we can train a model and see how it performs using the gadgets I built for euwiki (and have now generalized to enwiki, fawiki, etc.)

We don't have a translation for the b and c grades, but we have rules for FA, good and stub quality. I think that simply using the same grading used in English will work.

On the other side, this gadgets work great! Let's see if we can make them a tool!

How would you say "Assess article quality" in Galician? I'll use that to name the "labeling campaign" in Wiki labels.

@Halfak, "Assess article quality" in galician is "Avaliar a calidade do artigo".

I'm still waiting on an information page that looks like for galician that we might link to. For now, I'll just link to the English version.

And now there's a hold on translations via Translate wiki. See

In the short term, let's focus on getting galician translations in and I'll worry about getting those translations merged into Wiki labels.

Looks great! I'd recommend removing the "A" class from since it's almost totally unused on other wikis. I think the only reason that English Wikipedia keeps it is for historic reasons.

I'll make some updates to the current campaign shortly.

OK I update the information link. Things are weird with translate wiki now, so I'm going to see if I can get a special commit done for us.

Can I do a little question? What is the objective of campaigns? To train the system?

Yes. It will train ORES what quality looks like in Galician Wikipedia. Then hopefully, ORES will be able to make predictions that roughly match yours and that of your collaborators.

Then, when do we have to announce that campaigns to Galician community?

I've just gotten word from @Nikerabbit that translations will be pushed from again soon. Once they are, I'll do a quick deployment. Then we can announce and invite people to come and contribute assessments. I think the translations might come in within the next few hours.

I got most of the translations but it looks like there was a key translation that I missed. To save time, we can do this manually. Please translate the following English message. I've left notes next to them to explain how they are used in more detail.

  • "FA"
    • Explanation: An abbreviation for "Featured Article" in Wikipedia
  • "GA"
    • Explanation: An abbreviation for "Good Article" in Wikipedia
  • "B"
    • Explanation: Short for B-class article in Wikipedia
  • "C"
    • Explanation: Short for C-class article in Wikipedia
  • "Start"
    • Explanation: Short for Start-class article in Wikipedia
  • "Stub"
    • Explanation: Short for Stub-class article in Wikipedia
  • "Assessment"
    • Explanation: Used as a header for selecting an assessment class. Stub, C, GA, etc. are all potential "Assessments" that someone could choose while reviewing an article.
  • "See [[$1|WP:WikiProject assessment]] for detailed explanations of the assessment classes."
    • Explanation: This message will appear on the assessment form. It will provide a link to the page where possible assessments are described in more detail. I need you to translate the whole sentence including the link text.

@Elisardojm, I think this is the last thing we'll need to do before we announce the campaigns to the Galician community.

"FA" is "AC"
"GA" is "AB"
"B" is "B"
"C" is "C"
"Start" is "En Progreso"
"Stub" is "Bosquexo"
"Assessment" is "Avaliación"
"See [[$1|WP:WikiProject assessment]] for detailed explanations of the assessment classes." is "Consulte "See [[$1|Wikipedia:Avaliación de contido]] para explicacións detalladas das clases de avaliación."

@Elisardojm OK! I finally have the translations deployed. I think we're ready for an announcement.

According to, we need 37 campaigns more, isn't it?

I don't know exactly what we will get when we have that campaings. The activation of ORE system? I don't know how explain that to the comunity. Could you explain me a little?

When this campaign is complete, we'll be able to build an article quality prediction model in ORES for glwiki. Using this model, you'll be able to more easily track the progress you are making in glwiki. There are lots of opportunities to build on this prediction strategy. Check out for some of the things people have been working on.

Most specifically, I'll want to get you set up with the user-script that I made for @Theklan. He has been using that script to track progress on the core articles and to provide a quality assessment at the top of the current articles. Maybe he could provide a better description for how he is currently making use of the predictions.

In the meantime, I'll work on the documentation for the user-script. Right now, it's pretty minimal and does not discuss all of the functionality:

FWIW, I'll be working on the ArticleQuality.js documentation today so you'll be able to translate and make use of that.

Hi, I had annonunced the campaing in here:

I have noticed a thing that could be fixed, at that page,, user's pages link to, could them link to

Another question, where is the documentation of ArticleQuality.js? at translatewiki?

OK a quick response regarding user page links: It's based on the centralauth configuration. The links go to whatever wiki is listed as a user's "home wiki". This is usually the wiki where they first registered their account. If you think it would be better to link to the local wiki instead, I think that shouldn't be too hard to manage.

Also! I have fixing up the docs for ArticleQuality.js on my todo list. I just fixed an annoying bug T207505: Fix ArticleQuality.js so that it doesn't violate PoolCounter constraints and I'll get the documentation fleshed out next. :)

Hi, one question. People told me that the campaing shows random articles, and the problem is that the users can't qualify FA or GA articles about subjects that they don't know. Could they skip that articles until they find other that they can qualify?

Actually you don't have to evaluate the REAL content of the article, but the structural completeness: if it has sections, references, the article is correctly done in a technical way, it has images... but not if it is really well written.

+1 for what @Theklan said. I would encourage you to consider the quality of the text when evaluating too. We're in the process of updating our models to be able to detect evidence of poorly written prose. E.g. poor translations or non-neutral point of view.

If you still feel uncomfortable making an assessment, you can "skip" the item to let it return to the pool. Someone will eventually need to pick it up.

But..., if we don't evaluate the content, how we difference B and C articles? Only evaluating it's size?

Please do evaluate the content :) Eventually we'll make the model smart enough to catch some content issues as well.

Ok!, I'll advice the community, thanks!

Sorry for the delay. I have finally updated the documentation for ArticleQuality.js with new screenshots and a description of the three places that predictions appear after the gadget is enabled. See

Thanks @Halfak! Would it be possible to have this gadget as an opt-out instead than being something you have to call from your common.js page?

Hi, I have translated the documentation to Galician here: But I have a basic question: How can users install the gadget? :)

@Theklan and I just set up euwiki with a gadget entry in user preferences. I think that's probably the best way at this point. We'll need to have the labeling campaign finished before I can set up the gadget. It looks like it is 81% done right now, so there's not too much work left. See

Once that is done and we have a quality prediction model deployed, I'll be able to help you get the gadget configured for glwiki.

The labeling campaign is finished! :)

Finally getting a chance to work on this. Sorry for the delay.

Looks like we have 67 Stubs, 100 Starts, 131 C-class, 69 B-class, 7 GA, and 26 FA.

We have 171 FA's and 72 GA's in this query:

So ultimately, we have relatively even representation among the classes. I think this will work fine for training.

Hi Halfak, I have a last question: How can users install the gadget? :)

Yes! See I've enabled it by adding the importScript("User:EpochFail/ArticleQuality.js") to my "common.js" file. We can also set this up as a regular gadget. See for a discussion about how to enable that.

Halfak, at they are asking me how is the score of ORES, 0 to 6? 1 to 6? or other?

Oh, thanks! And..., I need more help...O:) Galician users don't wanted to activate the tool by default, and that makes that IPs couldn't see working the template at Do you know how could I made to fix that? Could you help me with that because I don't know how all that works? I want to modify the template, or create another one, to bring the ORE scores to all users, indepently that they has or not activated the tool at theirs preferences...

I believe that if you remove "|default" from the relevant line in MediaWiki:Gadgets-definition, it will disable the gadget by default.

Yes, I have removed it, now only users that set the tool can view it. The problem is that now, in, IPs can't see the result of the gadget. I'm seeking a method that works in template for all users but without using the "default"...

But..., IP users will not know that URL...

Halfak. I found strange values ​​in two articles, this,, has an 4.09 and is classified as good article, but this,, with 4,29 points, is classified as B-class article. Is that correct?

I see "ORES Valoración: AB (4.16)" for the first and "ORES Valoración: AC (5.23)" for the second. That seems right to me. Maybe there were changes made since you asked (sorry I've been away from keyboard for a few days).

Yes, now values are correct because the second article was modified.

I see. Can you tell me what version of the second article got the weird values?

Aha. It looks like this version gets B (4.26).

ORES gives us the following prediction:

"articlequality": {
          "score": {
            "prediction": "B",
            "probability": {
              "FA": 0.13589280444780466,
              "GA": 0.24031935595544715,
              "B": 0.4218837532081604,
              "C": 0.16318387441515875,
              "Start": 0.0317106976103722,
              "Stub": 0.007009514363056943

It looks like most likely class predicted is "B" but there's enough uncertainty in the higher ranges that the the prediction weight is centered between "B" and "GA".

I originally set up the weights as follows:

weights: {
				Stub: 1, 
				Start: 2,
				C: 3,
				B: 4,
				GA: 5,
				FA: 6

So it looks like "4.26" is bit above B class.

Woops. EC, but it looks like we're working from similar revisions.

I think we're running into the same situation that we did on Basque wiki. Because the model has a high level of uncertainty, the "weighted sum" values are pulled towards the middle (3/4 in this case). This is just an artifact of the distribution of prediction probabilities. We can help ORES be more confidence by giving it more training data. I've been looking into this problem for euwiki, so maybe we could do a new labeling campaign in Wiki labels for glwiki too. What do you think? I'd be looking to have you (and other glwiki-pedians) label 500-2000 new pages to give ORES more examples.

It sounds good to label new pages, we can do it when you notice us. But, I don't understand why the first article with score 4.09 is classified as good article, and this, that had 4.29, was classified as B-class. First article would not have to have class B too?

When I score that revision, I get:

"FA": 0.025956280154134086,
"GA": 0.45915229584584183,
"B": 0.16531438949192337,
"C": 0.28981745085868,
"Start": 0.05220736477378042,
"Stub": 0.007552218875640331

You can see that the probability is bimodal! It looks like the model is a bit confused about whether this article belongs in "C" class or "GA" class, but "GA" class is most likely. What do you think the prediction *should* have been?

Where is the score 4.09? The first article is really a GA... Then my question is, the total scores that I see at articles, aren't the scores used the tool to classify?

"Prediction" represents the quality class with the highest probability (in this case, "GA" at 42%).
The number that you see represents the central mass of the probability distribution. In most cases these agree, but in some cases, there's a high level of uncertainty and that's what we're seeing here.

We have a similar "problem" in Basque Wikipedia, and is because some articles are not rated consistently. So you can have an article with a C, then add a paragraph and have B, then add another paragraph and have Start. The overall result is good but, of course, data quality matters!