Page MenuHomePhabricator

Train/test draft topic model (new article routing AI)
Closed, ResolvedPublic

Description

Build an AI that categorizes pages into a basic set of categories.

We should be able to process an article draft and automatically suggest categories. This could be used for curation new pages (e.g. stubs and otherwise new articles) and for routing new page creations toward subject matter experts who would be more interested in reviewing them (WikiProjects).

Given a version of an article, predict the likelihood that the article will eventually be tagged by a particular WikiProject within a mid-level category.

Output:

{
  "medicine": 0.951,
  "history": 0.342,
  "biology": 0.522,
  "chemistry": 0.233,
  ...
}

We'd need to set up a multilabel classifier prediction model that will be able to predict a set of output classes. It looks like sklearn's RandomForrestClassifier supports this by default.

This task is done when an initial set of tasks are added to the backlog scoping the first steps of this project.

Event Timeline

Halfak created this task.Jan 12 2016, 12:11 AM
Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Active on the Machine Learning Platform (Current) board.
Halfak added a subscriber: Halfak.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 12 2016, 12:11 AM
Halfak added a subscriber: JMinor.Jan 12 2016, 12:11 AM

Talked to @JMinor at the dev summit. He might be interested in helping us out.

It also occurs to me that this might fit well with a new article spam detector. :)

Halfak renamed this task from [Scope] A classifier that recommends categories for new pages to [Scope] A classifier that recommends categories for new articles.Jan 12 2016, 12:12 AM
Halfak set Security to None.
Halfak renamed this task from [Scope] A classifier that recommends categories for new articles to New article WikiProject category predictor.Aug 18 2016, 9:30 PM
Halfak triaged this task as Low priority.
Halfak updated the task description. (Show Details)
Halfak renamed this task from New article WikiProject category predictor to Article categorizer .Jan 26 2017, 8:17 PM
Halfak updated the task description. (Show Details)
Halfak added subscribers: JoeWalsh, Fjalapeno.
Halfak renamed this task from Article categorizer to New article categorizer.Aug 2 2017, 7:25 PM
Halfak updated the task description. (Show Details)
Halfak added a comment.Aug 2 2017, 7:34 PM

Based on the WikiProject Directory, I think we want to predict the mid-level categories.

Here's a hand-curated list of 46 mid-level hierarchical categorizations I threw together:

  • culture.arts.music
  • culture.arts.performing
  • culture.arts.plastic
  • culture.arts.visual
  • culture.broadcasting
  • culture.crafts_and_hobbies
  • culture.entertainment
  • culture.games_and_toys
  • culture.food_and_drink
  • culture.internet_culture
  • culture.linguistics
  • culture.biography
  • culture.media
  • culture.philosophy_and_religion
  • culture.sports
  • geography.bodies_of_water
  • geography.cities
  • geography.countries
  • geography.africa
  • geography.americas
  • geography.asia
  • geography.europe
  • geography.oceania
  • geography.landforms
  • geography.maps
  • geography.parks_conservation_areas_and_historical_sites
  • history_and_society.history_and_society
  • history_and_society.business_and_economics
  • history_and_society.education
  • history_and_society.military_and_warfare
  • history_and_society.politics_and_government
  • history_and_society.transportation
  • stem.science
  • stem.biology
  • stem.chemistry
  • stem.economics
  • stem.engineering
  • stem.geosciences
  • stem.medicine
  • stem.information_science
  • stem.mathematics
  • stem.meteorology
  • stem.physics
  • stem.space
  • stem.technology
  • stem.time
Halfak renamed this task from New article categorizer to New article review routing AI.Aug 2 2017, 7:36 PM
Halfak updated the task description. (Show Details)
Halfak added a subscriber: Harej.Aug 2 2017, 7:58 PM
Nettrom added a subscriber: Nettrom.Aug 2 2017, 8:07 PM
Halfak updated the task description. (Show Details)Aug 2 2017, 9:14 PM
awight added a subscriber: awight.Aug 12 2017, 8:42 PM

@Harej pointed me to his Reports bot which is already suggesting articles for project membership based on category tags. It might be nice if we could plug into that workflow.

I drew a picture of how I imagine routing will work once this model is ready.

https://commons.wikimedia.org/wiki/File:New_article_routing.with_ORES.svg

Sumit added a subscriber: Sumit.Aug 17 2017, 5:51 PM
Sumit added a comment.Aug 18 2017, 7:50 PM

Also from eranroz, a bot tagging new articles with wikiprojects or lists using a rule-based system - https://en.wikipedia.org/wiki/User:AlexNewArtBot

Halfak renamed this task from New article review routing AI to Train/test draft topic model (new article routing AI).Sep 20 2017, 4:54 PM
Halfak moved this task from Active to Done on the Machine Learning Platform (Current) board.

This was done a while ago. We're working on deployment now.

Halfak closed this task as Resolved.Apr 16 2018, 2:26 PM
Halfak claimed this task.