Experimental API for exploring topic models
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Isaac
	Jul 24 2020, 3:28 PM

Description

Build out a user interface and documentation for interested parties to explore topic models. Details:

host APIs on Cloud VPS so more control over model size etc.
build general template for hosting new models on Cloud VPS: https://github.com/geohci/research-api-endpoint-template
UI should match Research team template: https://github.com/wikimedia/research-api-interface-template
Ability to query individual models: https://wiki-topic.toolforge.org/
Ability to easily compare models: https://wiki-topic.toolforge.org/comparison
Provide comprehensive comparison of ORES (text-based) models, Wikidata-based, and link-based models.
Add summary of comparison to Meta page: https://meta.wikimedia.org/wiki/Research:Language-Agnostic_Topic_Classification/Model_comparison

Dropped because longer-term efforts going on in this space:

Determine format for model "report card" that reports both performance and fairness/ethics-related details -- e.g., in the style of model cards.

Related Objects
Search...

Status	Assigned	Task
Resolved	Isaac	T258804 Language-Agnostic Topic Modeling
Resolved	Isaac	T258805 Experimental API for exploring topic models
Resolved	HAKSOAT	T259829 Topic classification: comprehensive comparison of ORES (text-based) models, Wikidata-based, and link-based models

Event Timeline

Isaac created this task.Jul 24 2020, 3:28 PM

Update:

Created standardized template for hosting models on Cloud VPS that handles all the setup via a simple script so pretty easily extendable to other models (already using for link-based and wikidata-based models).
Created UI for easily comparing models: https://wiki-topic.toolforge.org/comparison
- You can input a language + article title to compare results for specific articles or just the language (but leave title blank) to have the UI choose a random article for you
Current model performance report card but I'd like to standardize this a bit more
Initial pass at comparing Wikidata and link-based models but need to expand this to include ORES and be more accessible

Isaac moved this task from Backlog to FY2020-21-Research-July-September on the Research board.Jul 24 2020, 3:35 PM

Isaac edited projects, added Research (FY2020-21-Research-July-September); removed Research.

Update: looks likely that I'll be able to work with a contractor on the comprehensive comparison for the month of August, so I'm waiting to hear formally about that before proceeding.

Update: started work on comparison of ORES with link-based and Wikidata-based models. That'll be tracked under T259829 but I'll do weekly updates here still.

Weekly update: worked to identify metadata (e.g., page length) to help explain when predictions vary between ORES and Wikidata-based or link-based models. Early indications are that about half of the articles have the exact same predictions and the rest seem to be mostly that ORES predicts additional topics. Looking to understand if those additional topics are in the groundtruth or not and how changing the prediction threshold to something lower than 0.5 on the link-based and Wikidata-based models affects these results.

Weekly update:

continued progress on generating results and binning them by various metadata (page length, # outlinks, # sitelinks, # wikidata statements) and should have a first set of results later today or by Monday

Weekly update:

Per discussion with LZ, dropping model cards component from this work given efforts in Product on this front.
Summary of comparison of ORES text-based, link-based, and Wikidata-based models complete. A few statistics:
- Coverage of all Wikipedia articles across all languages:
  - Text-based: 56.5%
  - Outlink-based: 99.2%
  - Wikidata-based: 97.5%
- Comparison of Outlink-based and Wikidata-based models to text-based models in some languages that ORES supports (ar, cs, en, vi):
  - In general, the outlink-based model has slightly higher precision and recall than the Wikidata-based model and slightly better alignment with ORES
  - ORES produces the same predictions as Wikidata-based/link-based models for 50% of articles in these languages
  - The Wikidata/link-based models either have lower recall (missing labels but others all the same) or higher recall (additional labels not in ORES but they show up in groundtruth) for 40% of articles in these languages
  - The remaining 10% of articles have predictions from the outlink-based or Wikidata-based models that are neither found in the ORES predictions or groundtruth data. Further inspection of these suggests that most are perfectly reasonable though obviously there are some errors. The most salient errors for the outlink-based model are in incorrectly predicting that biographies are about women. I would suggest sticking with Wikidata to make this judgment (I also view incorrect predictions about gender of biographies to be much more problematic than other topic errors)