Page MenuHomePhabricator

Experimental API for exploring topic models
Closed, ResolvedPublic

Description

Build out a user interface and documentation for interested parties to explore topic models. Details:

Dropped because longer-term efforts going on in this space:

  • Determine format for model "report card" that reports both performance and fairness/ethics-related details -- e.g., in the style of model cards.

Event Timeline

Update:

  • Created standardized template for hosting models on Cloud VPS that handles all the setup via a simple script so pretty easily extendable to other models (already using for link-based and wikidata-based models).
  • Created UI for easily comparing models: https://wiki-topic.toolforge.org/comparison
    • You can input a language + article title to compare results for specific articles or just the language (but leave title blank) to have the UI choose a random article for you
  • Current model performance report card but I'd like to standardize this a bit more
  • Initial pass at comparing Wikidata and link-based models but need to expand this to include ORES and be more accessible

Update: looks likely that I'll be able to work with a contractor on the comprehensive comparison for the month of August, so I'm waiting to hear formally about that before proceeding.

Update: started work on comparison of ORES with link-based and Wikidata-based models. That'll be tracked under T259829 but I'll do weekly updates here still.

Weekly update: worked to identify metadata (e.g., page length) to help explain when predictions vary between ORES and Wikidata-based or link-based models. Early indications are that about half of the articles have the exact same predictions and the rest seem to be mostly that ORES predicts additional topics. Looking to understand if those additional topics are in the groundtruth or not and how changing the prediction threshold to something lower than 0.5 on the link-based and Wikidata-based models affects these results.

Weekly update:

  • continued progress on generating results and binning them by various metadata (page length, # outlinks, # sitelinks, # wikidata statements) and should have a first set of results later today or by Monday

Weekly update:

  • Per discussion with LZ, dropping model cards component from this work given efforts in Product on this front.
  • Summary of comparison of ORES text-based, link-based, and Wikidata-based models complete. A few statistics:
    • Coverage of all Wikipedia articles across all languages:
      • Text-based: 56.5%
      • Outlink-based: 99.2%
      • Wikidata-based: 97.5%
    • Comparison of Outlink-based and Wikidata-based models to text-based models in some languages that ORES supports (ar, cs, en, vi):
      • In general, the outlink-based model has slightly higher precision and recall than the Wikidata-based model and slightly better alignment with ORES
      • ORES produces the same predictions as Wikidata-based/link-based models for 50% of articles in these languages
      • The Wikidata/link-based models either have lower recall (missing labels but others all the same) or higher recall (additional labels not in ORES but they show up in groundtruth) for 40% of articles in these languages
      • The remaining 10% of articles have predictions from the outlink-based or Wikidata-based models that are neither found in the ORES predictions or groundtruth data. Further inspection of these suggests that most are perfectly reasonable though obviously there are some errors. The most salient errors for the outlink-based model are in incorrectly predicting that biographies are about women. I would suggest sticking with Wikidata to make this judgment (I also view incorrect predictions about gender of biographies to be much more problematic than other topic errors)

Weekly update:

Isaac updated the task description. (Show Details)