Page MenuHomePhabricator

Build article quality model for ptwikipedia
Closed, ResolvedPublic

Description

How do Wikipedians label articles by their quality level?

There is a template used in article talk pages: https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Marca_de_projeto
that allows defining the project topic, quality and importance. The importance level is project specific, but quality is overall. There is an automatic quality classification (https://pt.wikipedia.org/wiki/M%C3%B3dulo:Avalia%C3%A7%C3%A3o) that can be overuled by editors.

What levels are there and what processes do they follow when labeling articles for quality?

Article quality follows the general quality scale https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Escala_de_avalia%C3%A7%C3%A3o. It goes from 1 (poor quality), to 4 (ok quality). Quality 5 is reserved to "Artigos bons" (Good articles: https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Artigos_bons) and * (star) is reserved to "Artigos destacados" (Features articles: https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Artigos_destacados). The same scale applies to list articles. The top two classes are community based discussions, while the bottom four are individual decisions made by editors (or the automated classifier I mentioned before)

How do InfoBoxes work? Are they used like on English Wikipedia?

Infoboxes normally call the Info Module (https://pt.wikipedia.org/wiki/M%C3%B3dulo:Info) or use the Info template (https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Info).
There are some list using Wikidata properties (using primarily the https://pt.wikipedia.org/wiki/M%C3%B3dulo:WikidataIB)

Are there "citation needed" templates? How do they work?

Yes, you can mark a sentence as needing citations by using https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Carece_de_fontes, or a whole block (https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Carece_de_fontes/bloco). You can mark a section or whole article with https://pt.wikipedia.org/wiki/Predefini%C3%A7%C3%A3o:Carece_de_fontes. Several other types exist (see https://pt.wikipedia.org/wiki/Categoria:!Predefini%C3%A7%C3%B5es_sobre_fontes_em_falta)

Event Timeline

GoEThe created this task.Mar 2 2020, 2:21 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptMar 2 2020, 2:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Halfak added a subscriber: Halfak.Mar 2 2020, 4:16 PM

Looks like you can find the {{marca de projeto}} template on quite a few pages.

I can see the template used at https://pt.wikipedia.org/wiki/Discuss%C3%A3o:Luiz_In%C3%A1cio_Lula_da_Silva

{{Marca de projeto|3|Biografias|4|Políticos|4|Brasil|3|WP Offline|2|bot=4/20111127|rev=20170714}}

It looks like there's a general evaluation of "3" that happened on July 14th, 2017. But there's also 4 WikiProjects with different ratings ranging from 2-4.

@GoEThe, how do you think we should interpret this? Is the most recent WikiProject evaluation the best or should we be looking for the general evaluation exclusively?

Halfak added a comment.Mar 2 2020, 4:18 PM

Oh! I see that the first number is an *automatic evaluation*.

Chtnnh added a subscriber: Chtnnh.Mar 2 2020, 7:36 PM
GoEThe added a comment.Mar 3 2020, 1:55 PM

The first number is the general quality, the other ones are importance measures, not related to quality, but the relevance of the topic for each project.

GoEThe added a comment.Mar 3 2020, 1:56 PM

Oh! I see that the first number is an *automatic evaluation*.

Yes, some (many?) evaluations are automatic, perhaps you should exclude those.

Chtnnh claimed this task.Mar 5 2020, 7:45 PM

@Halfak @GoEThe I will be working on this task together with Halfak. Hope to finish it as soon as possible.

@Chtnnh thank you! Let me know if you need anything else.

@GoEThe could you describe what success would like for this task? As I cannot understand the links you have mentioned (I dont speak pt xD), it would help immensely.

@Chtnnh Not sure if this will answer your question, but perhaps you can follow up on more specific questions if needed.

I would like to be able to classify the quality of articles on a scale of 1 to 6, with 1 being the lowest quality and 6 the highest quality, even if no editor has included the "Marca de projecto" template.

Also, the automatic classification system we have in place is naive and looks at only a few parameters: "How many references?", "How many characters does it have?", "how many wikilinks?", etc. you can see the full list in https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Avalia%C3%A7%C3%A3o_autom%C3%A1tica
So, it would be better to have a more holistic model where other criteria are used, based on an AI model and actual classifications made by editors.

This could be useful to find potential candidates to featured article, prioritizing work on a wikiproject without having to spend to much time on manual marking and classifying articles, etc.

Excellent! It sounds like we're prepared to do what you need. I think the most difficult part will be extracting the classifications done by editors.
How do we differentiate the automated assessments from the ones done by editors?

Hi. I just noticed that in the example you mentioned (https://pt.wikipedia.org/wiki/Discuss%C3%A3o:Luiz_In%C3%A1cio_Lula_da_Silva), the classification was not automatic, but rather by an editor. See for example https://pt.wikipedia.org/wiki/Discuss%C3%A3o:Albino_Forjaz_de_Sampaio where the classification is automatic.

The wikicode will be something like {{marca de projeto|?|project_name|...}} for an automatic classification and {{marca de projeto|[1-6]|project_name|...}} for an editor assessment.

so, disregard the "bot" parameter.

@GoEThe thank you so much, this helps.

From what I understand, the current system looks for 7 parameters: Size (in bytes), Number of references, Number of internal links, Number of paragraphs/sections, Number of images, Maximum paragraphs and some predefined things that the article mustn't have.

The system can only classify articles into 2, 3 or 4 whereas all the others it must throw into 1 by default.

This seems to be a static classifier and not very clever on its own, I see why you are seeking a new one.

Here's what I am thinking of:

First, we build the extractor, which shouldnt take long.

Then we can add a feature list that will be more comprehensive for the model to parse through.

Third, we can proceed to choose the best model for the problem and move on to build the model and see results.

@Halfak @GoEThe how does this sound?

Sounds good to me.

Sounds right to me. We have extractors for pulling labeled data out of these types of templates. We should start there by implementing the extractor for ptwiki. Once we have that, we can check in and review what we get for labels.

When I ran this code, I got the following label counts:

$ cat ptwiki.labelings.20200301.json | json2tsv wp10 | sort | uniq -c
     51 0
  12543 1
  19743 2
  40841 3
   7597 4
     44 5

It looks like we aren't getting any "*" class articles. I wonder if we could find some examples of those articles/templates than then use them to debug our label extractor to see if it is missing these.

I just grabbed https://pt.wikipedia.org/w/index.php?title=SS_Edmund_Fitzgerald from the home page and it looks like there's no relevant template on the talk page.

Aha! I just checked https://pt.wikipedia.org/w/index.php?title=Discuss%C3%A3o:Anarquismo and it has the template we're looking for.

I see {{marca de projeto|6|Anarquismo|4|WP Offline|4|rev=20151026}} on the talk page.

It looks like we should be looking for a "6" instead of a "*". Also, it looks like we want to grab the first number in the template. I thought the first number that appeared in the template was automatically generated. Is that right? Or is it only automatically generated when it is < 5?

No, it is not automatically generated. I believe only when there is a '?' is the classification automatic. Any number there overrides the automatic classification.

OK I think I got lost a bit earlier. Let me try checking my assumptions again:

When we see a template that looks like this {{marca de projeto|6|Anarquismo|4|WP Offline|4|rev=20151026}}

That means there were 3 manual assessments:

  • An official assessment (by someone???) of 6
  • An assessment by someone from WikiProject Anarquismo of 4
  • An assessment by someone from WikiProject Offline of 4

Are any of these assessments more believable than the others? What would this template look like if an assessment was automatically generated?

I re-ran the extractor with these new assumptions and got:

   51 "0"
12861 "1"
19931 "2"
40944 "3"
 7784 "4"
   46 "5"
  139 "6"

Somehow we're getting a very small amount of "5". Does that seem right?

Chtnnh added a comment.Apr 1 2020, 5:40 AM

@Halfak Why is the sum of all articles in all classes so low? Doesn't the ptwiki have more than a million articles?

GoEThe added a comment.Apr 1 2020, 9:17 AM

OK I think I got lost a bit earlier. Let me try checking my assumptions again:

When we see a template that looks like this {{marca de projeto|6|Anarquismo|4|WP Offline|4|rev=20151026}}

That means there were 3 manual assessments:

  • An official assessment (by someone???) of 6
  • An assessment by someone from WikiProject Anarquismo of 4
  • An assessment by someone from WikiProject Offline of 4

Are any of these assessments more believable than the others? What would this template look like if an assessment was automatically generated?

The only number that is related to quality is the first one, the project specific number are equivalent to the "low", "medium" and "high" importance on the English Wikipedia. So you can forget about them.

GoEThe added a comment.Apr 1 2020, 9:19 AM

@Halfak Why is the sum of all articles in all classes so low? Doesn't the ptwiki have more than a million articles?

Not all articles are assessed or even have the template.

GoEThe added a comment.Apr 1 2020, 9:25 AM

I re-ran the extractor with these new assumptions and got:

   51 "0"
12861 "1"
19931 "2"
40944 "3"
 7784 "4"
   46 "5"
  139 "6"

Somehow we're getting a very small amount of "5". Does that seem right?

There were "0"s? That's strange. Good articles" 5" were implemented later than featured, but there are now more of those than featured. Perhaps they don't have a lot of templates updated.

Chtnnh added a comment.Apr 1 2020, 9:54 AM

@Halfak Why is the sum of all articles in all classes so low? Doesn't the ptwiki have more than a million articles?

Not all articles are assessed or even have the template.

Right, that makes sense.

The only number that is related to quality is the first one, the project specific number are equivalent to the "low", "medium" and "high" importance on the English Wikipedia. So you can forget about them.

Thank you for the clarification.

There were "0"s? That's strange. Good articles" 5" were implemented later than featured, but there are now more of those than featured. Perhaps they don't have a lot of templates updated.

Perhaps @Halfak has to tweak the assumptions with the new information that you have given.

He7d3r added a subscriber: He7d3r.

Thank you @He7d3r! That clarifies the template situation completely

Halfak added a comment.Apr 1 2020, 7:13 PM

Here's some new results just looking at the first number in the templates:

    19 "0"
147170 "1"
 38776 "2"
 11281 "3"
  6776 "4"
  1228 "5"
  1414 "6"

This is looking much more reasonable.

Chtnnh added a comment.Apr 3 2020, 6:55 PM

@GoEThe Apart from the Infobox and Citation needed templates, can we assume the other templates are the same as for enwiki? We will require those templates as well if they are different for building the model.

Halfak added a comment.Apr 3 2020, 8:02 PM

I just took a pass and figured a bunch of things out by navigating around. I gave some notes to @Chtnnh.

Halfak added a comment.Apr 6 2020, 8:40 PM

https://github.com/wikimedia/articlequality/pull/115

The model looks good. We're getting decent fitness across the classes.

Halfak added a comment.Apr 7 2020, 5:37 PM

Here are some new counts that I get after applying @He7d3r's notes:

143901 1
 31866 2
  5006 3
  1724 4
  1223 5
  1411 6

It looks like we were maybe picking up some labels that we shouldn't have before. I'll try to figure out what was going on there.

I forgot to take notes, but I did look into this. We used to match some templates weirdly. It looks like switching to using mwparserfromhell has helped. E.g. these templates would have matched for class "4":

{{Marca de projeto|?|Foobar|4}}

Now this matches "?" which we disregard. So I think we're set.

I just did another update of our extractor and the feature set based on some more feedback from @He7d3r. I think this could be the final run. I should have an update tomorrow.

Forgot to ping here. This is ready for final review.

Chtnnh closed this task as Resolved.May 26 2020, 6:02 PM