Page MenuHomePhabricator

Investigation: How do we get WikiProject topics, as defined by LiftWing?
Open, Needs TriagePublic

Description

As a member of the Campaigns team, I want to know if we can get WikiProject topics (as defined by LiftWing), so that we can allow users to discover & search for WikiProjects by topic and so that we can have topical data that aligns with those in the Newcomer Homepage.

Background: For the Community List MVP, we would like to expand the Event List so that we can also feature WikiProjects. To make this expansion especially meaningful to users, we would like to display the topic of the WikiProjects, so users can easily determine if they are interested in either learning more about the WikiProject or joining the WikiProject. We want the topics to be the same as those in LiftWing, so that we can have the same topics presented to users in both Growth tools & Campaigns tools.

There are different ways that we could get the WikiProject topic, such as looking at the main subjects(s) in Wikidata for WikiProjects and/or seeing what topics map to WikiProjects in LiftWing, or perhaps a combination of methods. The purpose of this investigation is to outline the possible options, so we can determine what next steps, if any, we take in displaying WikiProject topics in the Community List.

Note that the Community List MVP will be global, like the Event List. This means that we will display many different WikiProjects, not just the WikiProjects associated with the wiki on which the user is viewing the Community List. However, we would like to eventually input filters (such as wiki, topic, etc) so that users can more easily find what interests them, in particular.

Resources:

Acceptance Criteria:

  • Investigate options for how we can (or cannot) get the following data on WikiProjects:
    • WikiProject topic(s), as defined in LiftWing
      • Example: WikiProject Rihanna fits under Culture > Arts > Music; WikiProject France fits under Geography > Regions > Europe > Western Europe
      • Note that Wikidata uses 'main subject,' which is different than LiftWing topics, but there may be a way to translate from 'main subject' in Wikidata to 'article-topic' in LiftWing (see ORES topics & LiftWing article topic)
  • Share potential risks, concerns, or dependencies related to get any of this data

Event Timeline

ifried updated the task description. (Show Details)
ifried edited subscribers, added: Isaac; removed: ldelench_wmf.
ifried renamed this task from How do we get WikiProject topics? to Investigation: How do we get WikiProject topics, as defined by LiftWing?.Jul 24 2024, 10:28 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)

Sharing some thoughts! For a given WikiProject, we have two potential pathways to link it to a LiftWing topic:

  • Use the main-subject property (P921) for WikiProjects. The challenge is then connecting the values for main-subject to their corresponding LiftWing topics. The two challenges here are:
    • How many WikiProjects have a main-subject? This seems to be 50% (sparql) but this is at least easily solve-able by the community and for an MVP I assume it's more important to have accurate information than full coverage anyways.
    • How do we connect the main subject to a LiftWing topic? I came up with one hacky approach which is to take any sitelinks for the main subjects associated with a WikiProject (e.g., for WikiProject African Diaspora (Q15304953), this is African Diaspora (Q385967) which has 23 sitelinks though in practice I only use a random subset of 10 that is set to always include English). For each of those Wikipedia articles, I get the LiftWing topic predictions and average them together. Here's the final result: https://wiki-topic.toolforge.org/wikiproject-topic?qid=Q15304953 and you can test others (Rihanna; France). This seems to work well and I think could reasonably run as a regular batch job to update the Community List as new WikiProjects are created or main subjects are added etc.
  • Use the worklist for a given WikiProject and get topic predictions for e.g., 10 of those articles and do the same averaging. I think this would also work quite well but it does require us to map the WikiProject item to a language edition that uses the PageAssessments extension (to get the worklist). Eventually I want us to get there, but for now I think this might just be messier and the above seems to work alright so I'd recommend going with that. Especially because I know you're using the labels etc. from Wikidata so adding in the requirement that a WikiProject be tracked on Wikidata isn't unnecessary overhead.

Thank you so much for taking the time to think through how we could get the topics of WikiProjects, @Isaac! I'm sure this will give the team a lot to work with as a baseline when we dig into this work soon. It's great that you were able to think of way to connect Wikidata main subjects with LiftWing topics as well. Much appreciated, and please do feel free to add in any more comments to this ticket if more ideas come up.