Page MenuHomePhabricator

[Analytics] [Request] [WDQM] Reference Completeness of Wikidata's Data
Open, Needs TriagePublic

Description

WMDE Analytics Request

This task was generated using the WMDE Analytics request form. Please use the task templates linked on our project page to create tasks for the team. Thank you!

Why (Context & Decision)

What problem are you trying to solve, and what decision will this analysis inform? Briefly explain the organisational or strategic context, why this matters now, and what action will be taken based on the outcome.

As a part of T427512: [EPIC] [WDQM] Wikidata Quality Metric, we want to measure the completeness of references on Wikidata's data.

What (Scope & Output)

Describe the specific question(s), metrics, segments, or deliverables (e.g., dashboard, deep dive, experiment analysis), including any relevant definitions or constraints.

  • A measure of the inclusion of citations on properties that require them on Wikidata
  • A monthly pipeline that uses the most up to date weekly wmf.wikidata_entity snapshot
  • A chart for this metric over time on Superset

By When (Timing & Priority)

Provide a clear deadline, any key milestones (e.g. launch, leadership review), and note if timing is flexible or fixed.

05.06.2026


Information below this point is filled out by the analyst.

Sub Tasks

A breakdown of the steps to complete this task.

  • Write create table query
  • Write query to derive metric
  • Test table creation and metric query
  • Write DAG for metric computation
  • Test DAG for metric computation (with T427513)
  • Deploy DAG
  • Create chart on Superset (with T427513)
  • Finalize metric computation for all available weeks (we only have data for May 2026)

Estimation

Estimate: 1d
Actual:

Data

The tables that will be referenced in this task and the sample sizes from them that will be used.

  • wmf.wikidata_entity
  • discovery.wikibase_rdf

Event Timeline

Superset:wd_quality_metrics is finalized, so the last thing we're waiting on is the estimation of the mandatory constraints being valid in T427513. I'm doing 1/2 this metric and 1/2 the metric in T427513 for now, and will run the process for June when I'm back with the notebook in T427513.