Track Wikidata data completeness
Notes from discussion with Lydia!


Done in

  • Number of entites by entity type
    • Can probably be done just counting pages in a namespace
    • Count properties in SPARQL below, (all the rest are items)
    • SELECT (count(?cs) as ?count) WHERE { ?cs a wikibase:Property }
  • Number of redirects by entity type
    • As above redirects are recorded in the page table and we can filter by NS
    • look for ?x owl:sameAs ?y in SPARQL for redirects
  • Number of Item talk pages


Tracked in

  • Average blob size of items
    • Easy db query
  • Max blob size of items
    • Easy db query

Tracked in

  • Number of properties by datatype
    • Use the wb_property_info table

Tracked in

  • Number of ranks by type?
    • Can be done in SPARQL
    • SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wikibase:rank wikibase:PreferredRank}
    • For normal rank we just assume ( statement count - preferd rank - deprecated rank = normal rank )

Tracked in

  • References to Wikipedia
    • SELECT (count(distinct(?s)) AS ?scount) WHERE {?s prov:wasDerivedFrom wdref:004ec6fbee857649acdbdbad4f97b2c8571df97b}

Tracked in

  • Number of labels, descriptions & aliases per lang
    • can be done in the wb_terms table

Tracked in

  • Number of sitelinks per site
    • Can be done using wb_items_per_site

Tracked in

  • Count grouped by Number of labels, descriptions & aliases per item
    • Should be possible through the wb_terms table

Tracked in

  • Count grouped by Number of site links per item
    • Should be possible through the wb_items_per_site table

Tracked by

  • Count grouped by Number of statements per item


TODO / No tasks for this yet

  • Number of statements by data type
    • Could probably run a sparql query for each property??
  • Number of uses of Novalue and SomeValue in mainsnaks?
  • Number of qualifiers
    • ??????????????????????????????????????????
  • Referenced statements
    • ??????????????????????????????????????????
  • Count grouped by Number of qualifiers per statement?
  • Count AVG number of edits per page?

