Maniphest T202202

Build article quality model for svwiki
Closed, ResolvedPublic
Actions

Description

How do Wikipedians label articles by their quality level?
Most articles are not labeled at all and correspond to English Start and C. Above they are labeled with a template for that level.

What levels are there and what processes do they follow when labeling articles for quality?
Utmärkt artikel (like FA) with template https://sv.wikipedia.org/wiki/Mall:Utm%C3%A4rkt
Bra artikel (like GA) with template https://sv.wikipedia.org/wiki/Mall:Bra
Rekommenderad artikel (like B) with template https://sv.wikipedia.org/wiki/Mall:Rekommenderad
Stub with a template in category https://sv.wikipedia.org/wiki/Kategori:Alla_stubbmallar (which puts the article in the category https://sv.wikipedia.org/wiki/Kategori:Alla_stubbar )

There is a list of criterias for the three in the top here: https://sv.wikipedia.org/wiki/Wikipedia:Kriterier_f%C3%B6r_utvalda_artiklar
Anyone can label/unlabel articles as stubs or Rekommenderad (except yourself if you are the main editor) following these criterias.
For the top two there is a peer review process.

Worth to note is that most of the stubs are bot created (and of pretty decent quality) and will be placed in sub category to https://sv.wikipedia.org/wiki/Kategori:Robotskapade_artiklar

How do InfoBoxes work? Are they used like on English Wikipedia?
Infoboxes are very similar to English Wikipedia, and we are getting more and more that use Wikidata supported templates with no need for parameters.

Are there "citation needed" templates? How do they work?
Yes, the main one is https://sv.wikipedia.org/wiki/Mall:Kb and is used for each statement.
There are some more templates in this category: https://sv.wikipedia.org/wiki/Kategori:%C3%85tg%C3%A4rdsmallar_r%C3%B6rande_k%C3%A4llor_och_upphovsr%C3%A4tt covering corner cases (like this paragraph needs a source) or more general (like there are to few sources in this article as a whole).

All of these templates put the article in the category: https://sv.wikipedia.org/wiki/Kategori:Alla_artiklar_som_beh%C3%B6ver_k%C3%A4llor

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Halfak	T202202 Build article quality model for svwiki
		Resolved		ArielGlenn	T218316 writeuptopageid failing to split svwiki dump

Event Timeline

Ainali created this task.Aug 18 2018, 6:11 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptAug 18 2018, 6:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Salgo60 subscribed.Aug 18 2018, 8:46 PM

You have e.g. P3217 used in Template SBL. That is Dictionary of Swedish National Biography link ==> a source with hight trust that articles should use as a source.

I have written about that Wikidata needs to add a quality dimension telling how good the source is or what type a source is i.e. primary source link

To clarify, the most important difference in process is that we have nothing like the English Wikipedia WikiProject templates and thus not their importance/quality ratings. We have a GA/FA process reasonably similar to English Wikipedia, with the addition of the recommended articles (which, as said, can be added by anyone).

awight moved this task from Unsorted to New development on the Machine-Learning-Team board.Aug 22 2018, 4:23 PM

Alicia_Fagerving_WMSE mentioned this in T204319: [10 October] Present voting results.Sep 30 2018, 7:18 AM

Alicia_Fagerving_WMSE mentioned this in T204323: [10 Oct - 16 Oct] Evaluate voting results.Oct 1 2018, 7:40 AM

Sebastian_Berlin-WMSE subscribed.Oct 10 2018, 7:36 AM

What's the next step for this? Is there any more information needed?

As part of WMSE-Development-Support-2019 (Automatic article quality assessment), we intend to make a gadget for showing feedback after an edit, so it would be good to get this going.

No thoughts of gathering feedback from readers of an article? Both logged in and anonymous users

Example:

1-5 found what you looked after
1-5 quality of content
1-5 source quality
1-5 d you recommend other using WIkipedia

Listening to Wikimedia Research Why the World Reads Wikipedia more and more people just use Wikipedia for fast checking facts or read "media gossip" like who is "Kulturprofilen"

==> I interpret that as there is a big Quality gap that Wikipedia need to active understand and find were the readers thinks Wikipedia has a problem

@Salgo60: It has been tried.

https://en.wikipedia.org/wiki/Wikipedia:Article_Feedback_Tool

Sebastian_Berlin-WMSE added a project: WMSE-Development-Support-2019 (Automatic article quality assessment).Jan 30 2019, 4:06 PM

@Halfak, could you, or some other ORES-person, have a look at this?

Yes! Thank you for the ping. We're pretty backed up on modeling work. I think I'd like to have @hoo take a look at this after he's done with some other modeling work.

Halfak assigned this task to hoo.Feb 6 2019, 4:38 PM

Halfak edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.

Halfak removed hoo as the assignee of this task.Feb 11 2019, 8:50 PM

Looks like hoo can't take this on in the short term. So I'll be putting it on my own backlog. Hopefully I'll be back soon with more information. Thanks for your patience.

Sebastian_Berlin-WMSE added a project: User-Sebastian_Berlin-WMSE.Feb 19 2019, 9:24 AM

Sebastian_Berlin-WMSE moved this task from Backlog to Watchin' on the User-Sebastian_Berlin-WMSE board.

I'm going to attempt doing this as a side project. Bear in mind that it's the first time I work with any of this, it'll take a bit of time for me to figure it out.

• Gilles triaged this task as Medium priority.Feb 19 2019, 12:20 PM

Thanks a lot, @Gilles!

• Gilles added a subtask: T218316: writeuptopageid failing to split svwiki dump.Mar 14 2019, 4:19 PM

@Gilles Do you have updates on this project?

@Harej yes, after running into various bugs and issues, as of last week I've finally managed to extract article labelings for svwiki. Next I will move onto actually training models over the labelling dataset.

BTW, I've been discussing this model on our talk page: Topic: What changes in probabilities are significant?

@Gilles, are you still working on this task?

I'm swamped with my main work on the Performance team, so this has been on the backburner, sorry. If someone else is keen to pick up work on it, I'm more than happy to point to what's been done so far.

This change that came out of my initial work has been in review for a while and includes the feature extractor: https://github.com/wikimedia/articlequality/pull/81

No worries. I can work from this. Do you have any datasets extracted that I could work from? Or maybe the extractor is just fast enough to run again.

You can find the data in /home/gilles/articlequality/datasets on stat1007

Excellent! Thank you.

https://github.com/wikimedia/articlequality/pull/82

OK we have a model. Fitness isn't really that great, but it'll be interesting to see how it works in practice.

Halfak moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.Apr 30 2019, 7:59 PM

• Gilles reassigned this task from • Gilles to Halfak.May 1 2019, 10:57 AM

• Gilles subscribed.

How long do you think before it's ready to use?

Sorry for the delay. I've been on vacation for almost a week. I'll be looking to get this deployed some time this week and then I'll set you up with a simple gadget that will help you explore the quality of the predictions. I'll ping back here with updates.

This is now deployed to our beta (testing) service. See http://ores-beta.wmflabs.org/v3/scores/svwiki/

I aim to get this deployed into production on Monday and then we can talk about getting that gadget set up.

I dont know how else could I get the attention https://meta.wikimedia.org/wiki/WikiAI

In T202202#5173390, @Halfak wrote:

This is now deployed to our beta (testing) service. See http://ores-beta.wmflabs.org/v3/scores/svwiki/

Great, thanks!

Halfak moved this task from Review to Pending deployment on the Machine-Learning-Team (Active Tasks) board.May 13 2019, 4:15 PM

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:04:02Z] <halfak@deploy1001> Started deploy [ores/deploy@c17a1a2]: T202202

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:20:17Z] <halfak@deploy1001> Started deploy [ores/deploy@c17a1a2]: T202202

Mentioned in SAL (#wikimedia-operations) [2019-05-13T20:24:32Z] <halfak@deploy1001> Finished deploy [ores/deploy@c17a1a2]: T202202 (duration: 04m 16s)

Halfak moved this task from Pending deployment to Completed on the Machine-Learning-Team (Active Tasks) board.May 13 2019, 8:25 PM

OK the model is deployed. I've also configured a simple gadget to allow you to see the predictions in svwiki. See https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:EpochFail/common.js for how to enable it for your user account.

Halfak closed this task as Resolved.Jun 18 2019, 1:39 PM

Jopparn moved this task from Watchin' to Done on the User-Sebastian_Berlin-WMSE board.Nov 18 2019, 10:53 AM

ArielGlenn closed subtask T218316: writeuptopageid failing to split svwiki dump as Resolved.Dec 21 2020, 5:10 PM

Build article quality model for svwikiClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Build article quality model for svwiki
Closed, ResolvedPublic
Actions

Related Objects
Search...