Page MenuHomePhabricator

[Analytics] [Request] Analyze queries to check frequency of topics (QIDs)
Open, Stalled, Needs TriagePublic

Description

Wikidata Analytics Request

This task was generated using the Wikidata Analytics request form. Please use the task templates linked on our project page to create tasks for the team. Thank you!

Purpose

Please provide as much context as possible as well as what the produced insights or services will be used for.

We'd like to understand the frequency certain popular topics in SPARQL queries. This should help us identify the relative importance of the potential related data subsets.

Desired Outputs

The desired outputs of this task are listed as check boxes and confirmed as being finished below.

Downloadable csv showing the following data, ideally collected over a couple months:

  • the percentage of all SPARQL queries which include each of the following QIDs:
  • Q5 (humans)
  • Q483501 (artist)
  • Q33999 (actor)
  • Q482980 (author)
  • Q11424 (film)
  • Q571 (book)
  • Q838948 (work of art)
  • Q3305213 (painting)
  • Q43229 (organization)
  • Q106668099 (corporate body)
  • Q6881511 (enterprise)
  • Q6256 (country)
  • Q24249370 (non-human animal)
  • Q7889 (video game)
  • Q11173 (chemical compound)
  • Q2095 (food)
  • the percentage of all SPARQL queries which include each of the following properties:
  • P276 (location)
  • P625 (coordinate location)
  • the percentage of all SPARQL queries which use notability criteria:
  • Has sitelink ?item wikibase:sitelinks ?sitelinks.
  • Has image ?item wdt:P18 ?image.
  • Has website ?item wdt:P856 ?website.
  • Has external identifiers

Nice to have:

  • For each QID or property above, how often does the query include other QIDs, divided by buckets (ex. 1-5, 6-10, 10-20, etc)

Deadline

Please make the time sensitivity of this request clear with a date that it should be completed by. If there is no specific date, then the task will be triaged based on its priority.

end of April (before product work on data dumps)


Information below this point is filled out by the task assignee.

Assignee Planning

Sub Tasks

A full breakdown of the steps to complete this task.

  • Subtask

Estimation

Estimate:
Actual:

Data

The tables that will be referenced in this task and the samples from them that will be used.

  • link_to_table
    • sample_size

Notes

Things that came up during the completion of this task, questions to be answered and follow up tasks.

  • Note

Event Timeline

Ifrahkhanyaree_WMDE changed the task status from Open to Stalled.Mar 16 2026, 8:18 AM

This works will be paused since the WMF has asked us not to work on anything related to data dumps