Design item_quality form for Wikidata
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ladsgroup
	Jan 20 2017, 1:38 PM

Related Objects
Search...

Status	Assigned	Task
Resolved	• johl	T127047 Collection of topics for HPI hackathon
Resolved	awight	T187836 [Epic] Audit of pending ORES GUI deployments
Resolved	Glorian_WD	T127470 Deploy item quality classification model for Wikidata
Resolved	Glorian_WD	T157498 Train/test item quality model for Wikidata
Resolved	Glorian_WD	T157495 Complete Wikidata item quality campaign
Resolved	Halfak	T157493 Deploy Wikidata item quality campaign
Resolved	Halfak	T161002 Late march wikilabels deployment
Resolved	Halfak	T159570 Deploy the pilot of Wikidata item quality campaign
Resolved	Halfak	T155828 Design item_quality form for Wikidata
Resolved	Glorian_WD	T157489 [Discuss] item quality in Wikidata

Event Timeline

Ladsgroup created this task.Jan 20 2017, 1:38 PM

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJan 20 2017, 1:38 PM

Ladsgroup edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Jan 20 2017, 1:39 PM

Ladsgroup moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.

https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/29
Here's a snippet:

@Lydia_Pintscher @Lea_Lacroix_WMDE Does it look good?

Ladsgroup moved this task from Incoming to Blocked on others on the User-Ladsgroup board.Jan 20 2017, 1:42 PM

I wonder if we could provide some guidance for each quality level in the form -- to help people rate consistently. I'm worried that one labeler's "B" will be another labeler's "C"

I was thinking of two reviewers per task so it'll be less subjective but overall things will be more or less objective once we aggravate all labels and build a classifier on top of it.

Aggregating two labels might be hard. Maybe we could ask for two labels of a set of items and run a follow-up campaign to get a 3rd label on the items where labelers disagree.

Where are the labels defined? What criteria separates a "C" from a "B"?

@Halfak @Ladsgroup : I believe in order to rate items, people could follow the guideline in showcase items (https://www.wikidata.org/wiki/Wikidata:Showcase_items).

Maybe @Lydia_Pintscher can confirm this.

Glorian_WD added a comment.Jan 20 2017, 10:11 PM

This comment was removed by Glorian_WD.

Yes people should in general follow the showcase item criteria. So A would be "meets all criteria" and E would be "meets none of the criteria"? In this case we should list the criteria and say that.

@Lydia_Pintscher, what do you think about the middle quality classes? Could we pick and choose criteria and make statements about what types of items belongs at which level?

E.g.

E: Anything that doesn't the D criteria
D: A few useful statements and a description in at least one language
C: At least one non-trivial statement is referenced.
B: Aliases and description are translated into >= 5 languages
A: All Showcase criteria met

This is just an example. For English Wikipedia's 1.0 assessments they have descriptions that are a bit more subjective and make references to process and the level of coverage.

Stub: The article is either a very short article or a rough collection of information that will need much work to become a meaningful article. It is usually very short; but, if the material is irrelevant or incomprehensible, an article of any length falls into this category. Although Stub-class articles are the lowest class of the normal classes, they are adequate enough to be an accepted article, though they do have risks of being dropped from being an article all together.
Start: The article has a usable amount of good content but is weak in many areas. Quality of the prose may be distinctly unencyclopedic, and MoS compliance non-existent. The article should satisfy fundamental content policies, such as BLP. Frequently, the referencing is inadequate, although enough sources are usually provided to establish verifiability. No Start-Class article should be in any danger of being speedily deleted.
C: The article cites more than one reliable source and is better developed in style, structure, and quality than Start-Class, but it fails one or more of the criteria for B-Class. It may have some gaps or missing elements; need editing for clarity, balance, or flow; or contain policy violations, such as bias or original research. Articles on fictional topics are likely to be marked as C-Class if they are written from an in-universe perspective. It is most likely that C-Class articles have a reasonable encyclopedic style.
B: The article is suitably referenced, with inline citations. The article reasonably covers the topic, and does not contain obvious omissions or inaccuracies. The article has a defined structure. The article is reasonably well-written. The article contains supporting materials where appropriate. The article presents its content in an appropriately understandable way.
GA: Well written: the prose is clear and concise, and the spelling and grammar are correct. Verifiable and it contains no original research. It contains no copyright violations nor plagiarism. Broad in its coverage: it addresses the main aspects of the topic. Neutral: it represents viewpoints fairly and without editorial bias, giving due weight to each. Stable: it does not change significantly from day to day. Images are relevant to the topic, and have suitable captions.
FA: See https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria

@Halfak : I thought we do not have to define the criteria for each quality grade because the classifier will automatically define those for us. Am I right?

@Glorian_Yapinus, we'll need to have consistency in labeling in order for the classifier to be able to learn the distinctions between the middle-classes. It's OK to be a little bit inconsistent and the classifier will make up for that, but we want to be as consistent as possible.

Note that, the more consistent our labels, the better test data we'll have. It means that we'll be able to get test statistics that actually reflect reality and that our prediction probabilities will have more consistent and predictable properties.

Glorian_Yapinus added a subscriber: Jan_Dittrich.Jan 27 2017, 10:07 AM

I talked to @Glorian_WD; I suggested to take a look at the creation process for grading Rubrics and to specify the extremes first and than work towards the middle grades.

@Halfak: Which kind of classifier would be used? (Particularly: Will/Can it create some continuous score or will it only put out one of the defined classes?)

@Jan_Dittrich : the classifiers will classify items in defined classes.

@Jan_Dittrich & @Glorian_Yapinus, the classifier will classify into defined classed, but it will also output probabilities that will let us linearize the scale.

I should say that I expect that a GradientBoosting or RandomForrest model will likely fit this prediction problem well, but we might change to a different classifier strategy if we can push the fitness.

@Halfak, please find the attached file which specifies the criteria for each grade. I made that by referring to the showcase item criteria.

Item Quality Criteria for Campaign.odt16 KBDownload

After having a discussion about this with Jan and Lydia, we think maintaining the vagueness in each criteria is important. In other words, we do not want to be too specific in defining the criteria, so that people can use their common sense in evaluating items.

What do you think about the attached criteria?

Looks good. Maybe it's time to move it to the Wiki and to host a discussion about it. Once there's some buy-in, I think we'll be good to go.

@Halfak, I have posted the criteria on Wikidata project chat page.

Just created https://www.wikidata.org/wiki/Wikidata:Item_quality and posted there.

Halfak created subtask T157489: [Discuss] item quality in Wikidata.Feb 7 2017, 8:43 PM

Glorian_WD added a parent task: T157493: Deploy Wikidata item quality campaign.Feb 7 2017, 9:11 PM

Glorian_WD removed a parent task: T127470: Deploy item quality classification model for Wikidata.Feb 7 2017, 9:17 PM

Halfak renamed this task from Build item_quality form to Design item_quality form for Wikidata.Feb 7 2017, 9:59 PM

Halfak removed a project: WMDE-Tech-Communication-Mentoring-And-Events.

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).Feb 9 2017, 3:12 PM

Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.Feb 9 2017, 3:23 PM

Halfak added a parent task: T159570: Deploy the pilot of Wikidata item quality campaign.Mar 4 2017, 5:15 PM

Updated form to include summaries of each class: https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/29
Added HTML snippet support for https://github.com/wiki-ai/wikilabels/pull/162

Oh yeah. Also, I updated assets in flask-oojsui: https://github.com/wiki-ai/flask-oojsui/commit/269bfab8fc5800174b11e402bbef5850e8aae1db

Halfak moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Mar 11 2017, 5:04 PM

Ladsgroup removed a project: User-Ladsgroup.Mar 15 2017, 9:06 PM

Halfak closed this task as Resolved.Mar 16 2017, 9:21 PM

Halfak closed subtask T157489: [Discuss] item quality in Wikidata as Resolved.

	F5376335: Item Quality Criteria for Campaign.odt
	Jan 27 2017, 4:18 PM

	F5318532: pasted_file
	Jan 20 2017, 1:41 PM

	F5318529: pasted_file
	Jan 20 2017, 1:41 PM

Design item_quality form for WikidataClosed, ResolvedPublicActions

Related ObjectsSearch...

Event Timeline

Design item_quality form for Wikidata
Closed, ResolvedPublic
Actions

Related Objects
Search...