Page MenuHomePhabricator

Build article quality model for svwiki
Closed, ResolvedPublic

Description

How do Wikipedians label articles by their quality level?
Most articles are not labeled at all and correspond to English Start and C. Above they are labeled with a template for that level.

What levels are there and what processes do they follow when labeling articles for quality?
Utmärkt artikel (like FA) with template https://sv.wikipedia.org/wiki/Mall:Utm%C3%A4rkt
Bra artikel (like GA) with template https://sv.wikipedia.org/wiki/Mall:Bra
Rekommenderad artikel (like B) with template https://sv.wikipedia.org/wiki/Mall:Rekommenderad
Stub with a template in category https://sv.wikipedia.org/wiki/Kategori:Alla_stubbmallar (which puts the article in the category https://sv.wikipedia.org/wiki/Kategori:Alla_stubbar )

There is a list of criterias for the three in the top here: https://sv.wikipedia.org/wiki/Wikipedia:Kriterier_f%C3%B6r_utvalda_artiklar
Anyone can label/unlabel articles as stubs or Rekommenderad (except yourself if you are the main editor) following these criterias.
For the top two there is a peer review process.

Worth to note is that most of the stubs are bot created (and of pretty decent quality) and will be placed in sub category to https://sv.wikipedia.org/wiki/Kategori:Robotskapade_artiklar

How do InfoBoxes work? Are they used like on English Wikipedia?
Infoboxes are very similar to English Wikipedia, and we are getting more and more that use Wikidata supported templates with no need for parameters.

Are there "citation needed" templates? How do they work?
Yes, the main one is https://sv.wikipedia.org/wiki/Mall:Kb and is used for each statement.
There are some more templates in this category: https://sv.wikipedia.org/wiki/Kategori:%C3%85tg%C3%A4rdsmallar_r%C3%B6rande_k%C3%A4llor_och_upphovsr%C3%A4tt covering corner cases (like this paragraph needs a source) or more general (like there are to few sources in this article as a whole).

All of these templates put the article in the category: https://sv.wikipedia.org/wiki/Kategori:Alla_artiklar_som_beh%C3%B6ver_k%C3%A4llor

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

You have e.g. P3217 used in Template SBL. That is Dictionary of Swedish National Biography link ==> a source with hight trust that articles should use as a source.

I have written about that Wikidata needs to add a quality dimension telling how good the source is or what type a source is i.e. primary source link

To clarify, the most important difference in process is that we have nothing like the English Wikipedia WikiProject templates and thus not their importance/quality ratings. We have a GA/FA process reasonably similar to English Wikipedia, with the addition of the recommended articles (which, as said, can be added by anyone).

What's the next step for this? Is there any more information needed?

As part of WMSE-Development-Support-2019 (Automatic article quality assessment), we intend to make a gadget for showing feedback after an edit, so it would be good to get this going.

No thoughts of gathering feedback from readers of an article? Both logged in and anonymous users

Example:

  • 1-5 found what you looked after
  • 1-5 quality of content
  • 1-5 source quality
  • 1-5 d you recommend other using WIkipedia

Listening to Wikimedia Research Why the World Reads Wikipedia more and more people just use Wikipedia for fast checking facts or read "media gossip" like who is "Kulturprofilen"

==> I interpret that as there is a big Quality gap that Wikipedia need to active understand and find were the readers thinks Wikipedia has a problem

@Halfak, could you, or some other ORES-person, have a look at this?

Yes! Thank you for the ping. We're pretty backed up on modeling work. I think I'd like to have @hoo take a look at this after he's done with some other modeling work.

Halfak removed hoo as the assignee of this task.Feb 11 2019, 8:50 PM

Looks like hoo can't take this on in the short term. So I'll be putting it on my own backlog. Hopefully I'll be back soon with more information. Thanks for your patience.

I'm going to attempt doing this as a side project. Bear in mind that it's the first time I work with any of this, it'll take a bit of time for me to figure it out.

@Harej yes, after running into various bugs and issues, as of last week I've finally managed to extract article labelings for svwiki. Next I will move onto actually training models over the labelling dataset.

BTW, I've been discussing this model on our talk page: Topic: What changes in probabilities are significant?

@Gilles, are you still working on this task?

I'm swamped with my main work on the Performance team, so this has been on the backburner, sorry. If someone else is keen to pick up work on it, I'm more than happy to point to what's been done so far.

This change that came out of my initial work has been in review for a while and includes the feature extractor: https://github.com/wikimedia/articlequality/pull/81

No worries. I can work from this. Do you have any datasets extracted that I could work from? Or maybe the extractor is just fast enough to run again.

You can find the data in /home/gilles/articlequality/datasets on stat1007

https://github.com/wikimedia/articlequality/pull/82

OK we have a model. Fitness isn't really that great, but it'll be interesting to see how it works in practice.

How long do you think before it's ready to use?

Sorry for the delay. I've been on vacation for almost a week. I'll be looking to get this deployed some time this week and then I'll set you up with a simple gadget that will help you explore the quality of the predictions. I'll ping back here with updates.

This is now deployed to our beta (testing) service. See http://ores-beta.wmflabs.org/v3/scores/svwiki/

I aim to get this deployed into production on Monday and then we can talk about getting that gadget set up.

This is now deployed to our beta (testing) service. See http://ores-beta.wmflabs.org/v3/scores/svwiki/

Great, thanks!

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:04:02Z] <halfak@deploy1001> Started deploy [ores/deploy@c17a1a2]: T202202

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:20:17Z] <halfak@deploy1001> Started deploy [ores/deploy@c17a1a2]: T202202

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:24:32Z] <halfak@deploy1001> Finished deploy [ores/deploy@c17a1a2]: T202202 (duration: 04m 16s)

OK the model is deployed. I've also configured a simple gadget to allow you to see the predictions in svwiki. See https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:EpochFail/common.js for how to enable it for your user account.