Page MenuHomePhabricator

[Scraper] Document "Number of entities”
Open, LowPublic

Description

Ticket:

Current Situation:

  • This metric is tracked in Wikidata and Wikibase Suite.
  • In Wikibase Suite, the number of entities is tracked via SPARQL queries, meaning that we can only collect this metric for instances with SPARQL endpoints configured and accessible.
  • The logic used to calculate this metric is not currently documented or aligned across teams.

Goal:

  • Confirm and document the current method used by Wikibase Suite to track the number of entities.
  • Clearly communicate to the PM how the metric is derived, including constraints (e.g. limited to SPARQL-enabled instances).
  • Provide enough detail so this logic can be shared across the Wikibase ecosystem for transparency and alignment.

Acceptance Criteria:

  • Confirm that the number of entities is currently tracked through SPARQL queries in the wbs meta data pipeline.
  • Document the exact SPARQL query or logic used to calculate this metric.
  • Note the scope limitation: metric only applies to instances with accessible SPARQL endpoints.
  • Share the documentation with the PM in a way that is suitable for reuse across Cloud and Wikidata teams.
  • Flag any relevant assumptions or edge cases (e.g. entity types included or excluded).

Notes:

  • This metric gives a partial but valuable view of content size within the ecosystem.
  • Transparency about scope and method will help build confidence in the metric even if it's not comprehensive.

Event Timeline

Leif_WMDE renamed this task from Track “Number of entities” to Suite Scraper - Document "Number of entities”.Jun 10 2025, 8:35 AM
Leif_WMDE lowered the priority of this task from Medium to Low.
Leif_WMDE renamed this task from Suite Scraper - Document "Number of entities” to [Scraper] Document "Number of entities”.Jul 11 2025, 12:24 PM