Page MenuHomePhabricator

Implement feature for detecting clumps of text that lack references
Closed, ResolvedPublic

Description

This task is about using artificial intelligence to check and/or improve the quality of content on Wikimedia wikis. It should be helpful if you already have some basic knowledge about this area.

In https://phabricator.wikimedia.org/T170434 , it was pointed out that "Clumps of text without references is an important smell, even if the overall number of refs in the article is high."

We should be able to get a good statistic on the distance between references or references per section on a wiki page, or something like that.

Implement a feature that gathers signal for large chunks (amounts) of uncited text (=text in wiki pages that does not have any references) and check to see if predictions improve.

This is the list of features in wikiclass for English Wikipedia.
This task is done when at least one feature related to how a big chunk of uncited text is there as a feature and when the model has been rebuilt with the new statistics to show the improvement of the accuracy in the models.
Your Pull Request should be made against the wikiclass repository at https://github.com/wiki-ai/wikiclass

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
awight triaged this task as Low priority.Sep 6 2017, 5:17 PM
awight added a project: good first task.
Aklapper added a subscriber: awight.

@awight, @Ladsgroup: Please provide information / links to more information for a complete newcomer how to fix this task.
Also, a contributor is expected to provide a pull request to https://github.com/wiki-ai/wikiclass ?

@Aklapper: This is the list of features in wikiclass for English Wikipedia, this task is done when at least one feature related to how big chunk of uncited text is there as a feature and the model has been rebuilt with the new stats to show the improvement of the accuracy in the models. The PR should be made against the wikiclass repository as you said.

Do you mean machine learning by saying artificial intelligence?

Do you mean machine learning by saying artificial intelligence?

Yes

Are you sure you need ml for this kind of task? I mean we just have to calculate the density of references and see if it is lower than some threshold.

We already have a machine learning model for predicting article quality. This work would increase the signal diversity of the feature set.

Oh, sorry, didn't get the task. Sry for the dumb question

I couldn't find the current accuracy of the models, where can I find it?

nikitavbv added a subscriber: nikitavbv.

This seems to be really interesting. I will work on it!