Page MenuHomePhabricator

Investigation: What WikiProject data can we get from Wikidata? [3 days]
Closed, ResolvedPublic

Description

As a member of the Campaigns team, I want to know what structured data we can get on WikiProjects via Wikidata, and I want to know the general level of complexity and any risks associated with this work, so that we can determine what information we can easily present to users in order to build out a Community List MVP.

Background: For the Community List MVP, we would like to expand the Event List so that we can also feature WikiProjects. In order to do this, we need to first determine how we get data on WikiProjects. Some data on WikiProjects is essential for usability, so that users can easily determine what a given WikiProject is and whether or not they may be interested in joining them or learning more. Other data is less essential, but it is a nice to have. Meanwhile, we know that at least some data on WikiProjects is available in Wikidata, and it has the benefit of being regularly updated and maintained by volunteers.

Overall, the purpose of this investigation will be to determine what WikiProject data we can get from Wikidata, with a special focus on: WikiProject name, description, links, and wikis.

Notes on related tickets: We have separate investigations to explore if we can display the data globally (T371292), how we can get topics as defined by LiftWing (T370951), and how we should WikiProject information in the preferred language of the user (T370952)

Resources:

Acceptance Criteria:

  • Investigate options for how we can (or cannot) get the following data on WikiProjects from Wikidata:
    • Highest priority data
    • Medium priority data (nice to have, and listed in order of priority)
      • WikiProject description
      • WikiProject logo/image
      • WikiProject creation/inception date
      • WikiProject founders
      • Note: Ideally, we would get the usernames & user pages (maybe in the home wiki or the wiki of the event) of the founders, but other options could include a Wikidata item associated with the founder, if any
    • Note: Some of the info in Wikidata, such as the description, has translations (see example), but translation work will be handled in a separate task
  • Share potential risks, concerns, or dependencies related to get any of this data
  • If this cannot be accomplished within the 3 day timeboxed period, share what has been accomplished and suggested next steps, which we can then discuss as a team in our planning of what to do next

Event Timeline

ifried renamed this task from Investigation: What WikiProject data can we get from Wikidata? to Investigation: What WikiProject data can we get from Wikidata? [3 days].Jul 29 2024, 9:52 PM
ifried updated the task description. (Show Details)

I've installed Wikibase locally following these instructions. Below are the preliminary results of this investigation. There's much more to learn on this, but I'm sharing now to let other people know where we're at.

WikiProject name(s)

I assume this refers to the entity label. It can be obtained, and we can display it in whatever language we want. If it doesn't exist in a given language, it follows the fallback chain like everything else. Note: I can't tell whether WD labels are the best option for displaying the name. In a lot of cases, the label matches the WikiProject page title, which includes the namespace etc.

WikiProject links
WikiProject wikis

Doable; for links, it's especially easy when we only need local links (but external links can also be obtained).

WikiProject description

Same as the label. Here too, the description may not exist, or be in another language.

WikiProject logo/image

Doable, but as already noted in previous meetings, the vast majority of WikiProject items have no logo. Larger WikiProjects may have multiple logos (like Women in Red), in which case it's unclear which one we would be showing.

WikiProject creation/inception date
WikiProject founders

Should be similar to the logo, but I fail to see the usefulness of these. I don't know how to verify this in SPARQL (and could use some help with it), but I suspect that very few WikiProjects have founders or a creation date; none of the ones I checked manually have them, except for again Women in Red (which, just to reiterate, shouldn't be taken as example of a typical WikiProject), and WikiProject COVID-19 (dates only). Even for cases such as Women in Red, I don't see why the average user on xyz-wiki would want to know who founded the WikiProject, as they're probably not active on xyz-wiki and may not even have a language in common.


Here's a snippet that can be used locally to experiment with the above, for future reference. I'm no Wikibase expert, and I'm sure this could be improved/optimized in lots of different ways; we would also need to make sure that it's sufficiently fast etc. For a quick POC, you can add this code to any special page class.

// Choose a wikiproject item.
$entityStr = 'Q5';

$entityID = WikibaseClient::getEntityIdParser()->parse( $entityStr );
$entity = WikibaseClient::getEntityLookup()->getEntity( $entityID );
if ( !$entity instanceof Item ) {
	throw new \Error;
}

$language = $this->getLanguage();

$languageFallbackChainFactory = WikibaseClient::getLanguageFallbackChainFactory();
$termFallbackChain = $languageFallbackChainFactory->newFromLanguage( $language );
$labelDescriptionLookup = new LanguageFallbackLabelDescriptionLookup(
	new EntityRetrievingTermLookup( WikibaseClient::getEntityLookup() ),
	$termFallbackChain
);
$label = $labelDescriptionLookup->getLabel( $entityID );
echo "<p>Label: {$label->getText()} (lang={$label->getActualLanguageCode()})</p>";

$siteLinks = $entity->getSiteLinkList();
echo "<p>Site links:</p><ul>";
foreach ( $siteLinks as $siteLink ) {
	echo "<li>{$siteLink->getSiteId()} - {$siteLink->getPageName()}</li>";
}
echo "</ul>";

$description = $labelDescriptionLookup->getDescription( $entityID );
if ( $description ) {
	echo "<p>Description: {$description->getText()} (lang={$description->getActualLanguageCode()})</p>";
} else {
	echo "<p>No description available</p>";
}

$statements = $entity->getStatements();

// Replace with actual property ID.
$imagePropId = new NumericPropertyId( 'P3' );
$images = $statements->getByPropertyId( $imagePropId );
echo "<p>Images:</p><ul>";
foreach ( $images as $image ) {
	$mainSnak = $image->getMainSnak();
	if ( !$mainSnak instanceof PropertyValueSnak ) throw new LogicException();
	echo "<li>{$mainSnak->getDataValue()->getValue()}</li>";
}
echo "</ul>";

For the 4449 wikiprojects extant in wikidata, the number of wikiprojects with populated properties are listed below

logo: 225
inception: 306
founders: 270

I was unable to find a valid property value for "Description" and any of the ones I found that could be appropriate had no entries,

@Daimona & @MHorsey-WMF: Hello! It looks like this investigation is complete, and we will be consulting WMDE as part of T372502. Shall we close this ticket?

@Daimona & @MHorsey-WMF: Hello! It looks like this investigation is complete, and we will be consulting WMDE as part of T372502. Shall we close this ticket?

SGTM!