Page MenuHomePhabricator

[Epic] Article importance prediction model
Open, LowPublic

Description

Rate article importance. This can take many different forms and any given rating of importance likely depends on the context under which it was asked (e.g., importance of a given article to a topic or a wiki or a region or to the diversity of coverage in a wiki at a given point in time). This work will help to identify these factors and build tools that support automatic ranking of articles according to their importance in a given context.

Background

Article importance is nebulous but important component of prioritizing wiki work (see this very comprehensive lit review). Various applications of importance can be found across the wikis -- e.g., Vital Articles lists, WikiProject importance assessments, inclusion in offline wikis, Identifying Topics for Impact in Movement Strategy.

In previous work, article importance appears to be most often represented via demand (as measured by pageviews), centrality (as measured by number of inlinks, PageRank, or other network measures), or language coverage (number of sitelinks). While these factors have been demonstrated to provide strong indicators of importance per external assessments, they tend to reinforce existing notions of importance and do not necessarily help move us towards a more diverse and inclusive Wikipedia given the large gaps seen in reader populations, existing content biases, and biases in the external sources that are available and types of people and ideas that history has centered.

Wiki things it could help with

Potential components of importance

  • Topical relevance -- either along a pre-defined taxonomy or more ad-hoc semantic relatedness of a given keyword / article to other Wikipedia articles such as via MoreLike. For instance, en:Waffle is top-importance for WikiProject Breakfast, but only high-importance for WikiProject Food and Drink.
  • Global relevance -- e.g., geographic (what countries are mentioned), language (sitelinks)
  • Reader demand -- e.g., pageviews, reader sessions, language switching
  • Centrality -- e.g., inlinks, pagerank
  • Diversity -- e.g., to what degree does an article match characteristics of existing articles vs. provide new content
  • ...

Past work

In the past, work has largely focused on modeling the article importance assessments produced by WikiProjects (akin to ORES articlequality models):

Other related projects:

Event Timeline

Halfak renamed this task from Article importance prediction model to [Epic] Article importance prediction model.Jan 17 2017, 9:02 PM
Halfak triaged this task as High priority.
Halfak lowered the priority of this task from High to Low.Apr 10 2019, 5:40 PM
Nettrom added a subscriber: Isaac.

I've updated the project page on meta so it marks the research project as completed, and links to the GitHub repository that contains the code I wrote during the project.

I've been in contact with @Isaac about future research in this area, so this ticket might be picked up as part of that. Removing myself as the assignee in the meantime, as I don't have the bandwidth to actively work on this project.

Now that we have mad a bunch of progress on topic modeling, I wonder if we might use topic-spaces as a focus for importance.

Thanks @Nettrom for adding me to this -- I should have known to look for a task like this before :)

I'm not going to assign it to myself yet because still waiting to see what direction that importance research takes, but if we continue to build some momentum in this space, I'll take it on.

Now that we have mad a bunch of progress on topic modeling, I wonder if we might use topic-spaces as a focus for importance.

@Halfak could you elaborate? Are you thinking just generally that topical relevance is a part of importance so that the same article should be ranked as more or less important depending on an input topic space? Or something more specific?

Isaac edited projects, added Research; removed Research ideas.

I'm going to go ahead and claim this epic task as we're looking to begin work on article importance. I'm going to update the task description as well to make this a broader task for the work we're hoping to do around measuring article importance (as opposed to any specific question).