Page MenuHomePhabricator

Topic classification: comprehensive comparison of ORES (text-based) models, Wikidata-based, and link-based models
Closed, ResolvedPublic

Description

Compare ORES (text-based) models, Wikidata-based, and link-based topic classification models to understand how consistent the three models are in their predictions -- in particular, how much the link-based and Wikidata-based models vary from ORES.

Event Timeline

In the last week, I have tried understanding how the various APIs for ORES, Wikidata and Link based predictions work. I have also come up with a document containing possible structure the table to be created before analysis.

Initially working on getting everything together with comparing the link-based predictions to the ORES predictions, I was able to come up with this spreadsheet from 500 topics.

While there's no real analysis going on yet, a quick look at that spreadsheet shows link-based predictions have a high precision but not so high recall. Isaac and I are considering adding some metadata from the articles themselves to the table to help provide more insight into why we get whatever results we get during analysis.

Next steps for me will be getting to include the metadata like article size and the number of languages in which an article is available in the table.

I'm going to close out this task in a few days. Analysis was quite useful and public write-up of this analysis will be here: https://meta.wikimedia.org/wiki/Research:Language-Agnostic_Topic_Classification/Model_comparison