Page MenuHomePhabricator

[Epic] Article importance prediction model
Closed, DeclinedPublic


Rate article importance. This can take many different forms and any given rating of importance likely depends on the context under which it was asked (e.g., importance of a given article to a topic or a wiki or a region or to the diversity of coverage in a wiki at a given point in time). This work will help to identify these factors and build tools that support automatic ranking of articles according to their importance in a given context.


Article importance is nebulous but important component of prioritizing wiki work (see this very comprehensive lit review). Various applications of importance can be found across the wikis -- e.g., Vital Articles lists, WikiProject importance assessments, inclusion in offline wikis, Identifying Topics for Impact in Movement Strategy.

In previous work, article importance appears to be most often represented via demand (as measured by pageviews), centrality (as measured by number of inlinks, PageRank, or other network measures), or language coverage (number of sitelinks). While these factors have been demonstrated to provide strong indicators of importance per external assessments, they tend to reinforce existing notions of importance and do not necessarily help move us towards a more diverse and inclusive Wikipedia given the large gaps seen in reader populations, existing content biases, and biases in the external sources that are available and types of people and ideas that history has centered.

Wiki things it could help with

Potential components of importance

  • Topical relevance -- either along a pre-defined taxonomy or more ad-hoc semantic relatedness of a given keyword / article to other Wikipedia articles such as via MoreLike. For instance, en:Waffle is top-importance for WikiProject Breakfast, but only high-importance for WikiProject Food and Drink.
  • Global relevance -- e.g., geographic (what countries are mentioned), language (sitelinks)
  • Reader demand -- e.g., pageviews, reader sessions, language switching
  • Centrality -- e.g., inlinks, pagerank
  • Diversity -- e.g., to what degree does an article match characteristics of existing articles vs. provide new content
  • ...

Past work

In the past, work has largely focused on modeling the article importance assessments produced by WikiProjects (akin to ORES articlequality models):

Other related projects:

Event Timeline

Halfak renamed this task from Article importance prediction model to [Epic] Article importance prediction model.Jan 17 2017, 9:02 PM
Halfak triaged this task as High priority.
Halfak lowered the priority of this task from High to Low.Apr 10 2019, 5:40 PM
Nettrom added a subscriber: Isaac.

I've updated the project page on meta so it marks the research project as completed, and links to the GitHub repository that contains the code I wrote during the project.

I've been in contact with @Isaac about future research in this area, so this ticket might be picked up as part of that. Removing myself as the assignee in the meantime, as I don't have the bandwidth to actively work on this project.

Now that we have mad a bunch of progress on topic modeling, I wonder if we might use topic-spaces as a focus for importance.

Thanks @Nettrom for adding me to this -- I should have known to look for a task like this before :)

I'm not going to assign it to myself yet because still waiting to see what direction that importance research takes, but if we continue to build some momentum in this space, I'll take it on.

Now that we have mad a bunch of progress on topic modeling, I wonder if we might use topic-spaces as a focus for importance.

@Halfak could you elaborate? Are you thinking just generally that topical relevance is a part of importance so that the same article should be ranked as more or less important depending on an input topic space? Or something more specific?

Isaac edited projects, added Research; removed Research ideas.

I'm going to go ahead and claim this epic task as we're looking to begin work on article importance. I'm going to update the task description as well to make this a broader task for the work we're hoping to do around measuring article importance (as opposed to any specific question).

Isaac removed Isaac as the assignee of this task.Sep 7 2022, 12:54 PM
Isaac claimed this task.

I'm going to set the status of this to Declined but other folks should feel free to take it on if desired. After working in this space for a few years, I think my thoughts have evolved on the best approach to supporting Wikimedians in prioritizing content for improvement. This task was written with the thought that it would be valuable to derive "global" signals for the importance of an article that could be used to make decisions about priority. While there is likely still value in surfacing some of those signals -- especially when trying to establish high-level metrics -- I lean towards "importance" being far more contextual and now think that best path forward is focusing on filters that help editors find content that is most relevant to their specific interests / background. For instance, instead of trying to help a campaign organizer to rank 1000 potential articles on a worklist by priority to improve, provide filters so that participants can easily identify the content from that list that is relevant to their region, prior knowledge, interests, etc. That work continues through e.g., a list-building prototype, knowledge gap facets, topic filters, and more.