Page MenuHomePhabricator

Track Wikidata data completeness
Closed, ResolvedPublic

Description

Notes from discussion with Lydia!

~~

Done in https://github.com/wikimedia/analytics-limn-wikidata-data/blob/2e7da506b69792f79d72c384790db308fbd9cd47/graphite/site_stats/pages_by_namespace.php

  • Number of entites by entity type
    • Can probably be done just counting pages in a namespace
    • Count properties in SPARQL below, (all the rest are items)
    • SELECT (count(?cs) as ?count) WHERE { ?cs a wikibase:Property }
  • Number of redirects by entity type
    • As above redirects are recorded in the page table and we can filter by NS
    • look for ?x owl:sameAs ?y in SPARQL for redirects
  • Number of Item talk pages

~~

Tracked in https://phabricator.wikimedia.org/T119602

  • Average blob size of items
    • Easy db query
  • Max blob size of items
    • Easy db query

Tracked in https://phabricator.wikimedia.org/T119603

  • Number of properties by datatype
    • Use the wb_property_info table

Tracked in https://phabricator.wikimedia.org/T119606

  • Number of ranks by type?
    • Can be done in SPARQL
    • SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wikibase:rank wikibase:PreferredRank}
    • For normal rank we just assume ( statement count - preferd rank - deprecated rank = normal rank )

Tracked in https://phabricator.wikimedia.org/T119607

  • References to Wikipedia
    • SELECT (count(distinct(?s)) AS ?scount) WHERE {?s prov:wasDerivedFrom wdref:004ec6fbee857649acdbdbad4f97b2c8571df97b}

Tracked in https://phabricator.wikimedia.org/T119608

  • Number of labels, descriptions & aliases per lang
    • can be done in the wb_terms table

Tracked in https://phabricator.wikimedia.org/T119609

  • Number of sitelinks per site
    • Can be done using wb_items_per_site

Tracked in https://phabricator.wikimedia.org/T119610

  • Count grouped by Number of labels, descriptions & aliases per item
    • Should be possible through the wb_terms table

Tracked in https://phabricator.wikimedia.org/T119611

  • Count grouped by Number of site links per item
    • Should be possible through the wb_items_per_site table

Tracked by https://phabricator.wikimedia.org/T119621

  • Count grouped by Number of statements per item

~~

TODO / No tasks for this yet

  • Number of statements by data type
    • Could probably run a sparql query for each property??
  • Number of uses of Novalue and SomeValue in mainsnaks?
  • Number of qualifiers
    • ??????????????????????????????????????????
  • Referenced statements
    • ??????????????????????????????????????????
  • Count grouped by Number of qualifiers per statement?
  • Count AVG number of edits per page?

Related Objects

Event Timeline

Addshore raised the priority of this task from to Needs Triage.
Addshore updated the task description. (Show Details)
Addshore subscribed.
Addshore set Security to None.
Addshore triaged this task as Medium priority.Nov 23 2015, 7:33 PM