Page MenuHomePhabricator

Follow-up cleanup to topic models
Open, MediumPublic

Event Timeline

Thanks for creating this task, @Halfak. I'll respond to T245368#5941808 here (also @Tgr and @Isaac, who were participating).

I see what you mean about women-related topics, and why they might be okay in "Biography (women)". You're saying we could instead retitle that topic like "Women biographies and organizations"? Is there an easy way to use Wikidata to check what percent of articles with high scores on that topic are actually biographies?

And another question -- I see that you made subtasks for disambiguation pages and for "Society". Do you also intend to address the other issues we listed, like the weaker topics, especially "Central Africa"?

Thanks for creating this task, @Halfak.

Agreed!

Is there an easy way to use Wikidata to check what percent of articles with high scores on that topic are actually biographies?

If you can grab a sample of even a few hundred articles with a high predicted probability for Biography (women), then it's very simple to write a script to check whether they are biographies of women per Wikidata. I'm happy to do the check part though would need help extracting the list of predicted Biography (women) articles.

MMiller_WMF added a subscriber: Etonkovidova.

@Halfak -- I wanted to ping you about the questions in my previous comment.

Also, I added a subtask in which @Etonkovidova details the performance of the "Central Africa" topic.

Halfak triaged this task as High priority.Mar 23 2020, 4:55 PM
Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.
Halfak lowered the priority of this task from High to Medium.May 4 2020, 5:14 PM

Moving this to "Medium" because it seems getting more topic models has higher priority than this.