Page MenuHomePhabricator

Article similarity scorer
Open, Needs TriagePublic

Description

Build an AI that scores the relatedness of two articles.

Wiki thing it helps with:

  • Wikidata items that may be cross-wiki
  • Article recommender, both reading and editing
  • Help with subject spaces
  • Behavioral analysis of editors -- do editors change the subject spaces they edit in over time?

Things that might helps us get this AI built:

Event Timeline

Halfak created this task.Jan 20 2017, 9:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 20 2017, 9:16 PM
Halfak added a subscriber: Shilad.Jan 20 2017, 9:18 PM

@Shilad was working a project that provided semantic related measures as an API. Is that still happening?

Halfak updated the task description. (Show Details)Jan 23 2017, 4:50 PM
Shilad added a comment.EditedJan 26 2017, 8:55 PM

I have spent quite a bit of time on this over the past few years. I do have a service that I could make available as an endpoint. HOWEVER, from what I've seen in my projects a much better approach is combining the work of Ellery Wulczyn on navigation vectors (https://meta.wikimedia.org/wiki/Research:Wikipedia_Navigation_Vectors) with the "standard" content-based approaches from Wikipedia.

The navigation vectors can understand "subjective" relationships that are not encoded within the Wikipedia content itself. For example, relationships between books, movies, etc. While Wikipedia's factual information (authors, actors, directors, etc.) doesn't capture this well, navigation patterns seem to.

I think that "shallow" approaches could simply combine the two types of vectors and deep learning approaches (that require access to the underlying session data) could do an even better job. I'm happy to help on these efforts.