Page MenuHomePhabricator

Collect statistics from Wikimedia Wikis about how many pages are using Wikidata data beyond sitelinks
Closed, ResolvedPublic

Description

We want to understand how many articles/pages of every Wikimedia wiki are using data from Wikidata beyond just being sitelinked to an Item.

That information is already available in form of usage tracking on the individual wikis.

Example query:

SELECT COUNT(DISTINCT page_id)
FROM wbc_entity_usage
JOIN page ON wbc_entity_usage.eu_page_id = page.page_id
WHERE page.page_namespace = 0
AND eu_aspect != "S";

Acceptance criteria:

  • for now: 1-time data collection for every Wikimedia wiki
    • in the future: continuous data collection resulting in some visualization over time
  • to be defined: some form of summarizing basic insights from the data
    • absolute number and percentage of pages that use entities
    • for each individual wiki
    • per project (wikipedias, wiktionaries, wikivoyages, ...)
    • TBD: per language family? (enwiki, enwiktionary, enwikivoyage, ...)

Open Questions:

  • The above is just for the main namespace, how much effort would it be to do that for all namespaces that might use data?

Event Timeline

Michael claimed this task.
Michael added a subscriber: Manuel.

The data in https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_percentUsage/ seems to be more-or-less exactly what we were looking for in this task. Thank you for pointing us to it, @Manuel!

I'm resolving this for now. We can open up a new task when we have more specific needs.