Page MenuHomePhabricator

Review "Topics for each Wikipedia Article across Languages" dataset
Closed, ResolvedPublic

Description

Diego shared a dataset with topics for (almost) every article in all Wikipedias based on wiki-topics tool (by User:Isaac_(WMF)), using the Draft_topic taxonomy (by User:Halfak_(WMF)).
This dataset contains the predicted topic(s) for each Wikipedia article with a Wikidata item across languages.

  • Schema for this dataset
Column NameDescription
QidWikidata Item Id
topicTopic based on the ORES draft topic model (https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic)
probabilityProbability to belong to the topic
page_idpage_id
page_titlepage_title
wiki_dbwiki_db, for example for english Wikipedia is enwiki
  • Taxonomy used in the dataset. The text in green are the topics (for example in line 50 you have topic Food & Drink)
  • Wikidata coverage for Wikipedia articles (not including redirect pages), 99.7% in English Wikipedia.

We want to review this dataset to see if it could be used to create topic dimension in the content dataset for articles.

Event Timeline

cchen triaged this task as Medium priority.May 19 2020, 9:32 PM
cchen created this task.
cchen changed the task status from Duplicate to Resolved.Sep 21 2020, 4:49 PM

Updates will be in parent task.

The comparisons between this wikidata based model and other models are in Topic Modeling Efforts.