Page MenuHomePhabricator

Collect statistics about `instance of` Items with/without sitelinks
Open, Needs TriagePublic

Description

To better understand the patterns of what Wikidata Items are connected to articles/pages on other Wikimedia wikis, we would like to get an overview of Items with/without sitelinks grouped by their ontological properties, especially P31 "instance of".

Acceptance criteria:

  • for Items with sitelinks we have a list of the 100 most common values for P31 "instance of"
  • for Items without sitelinks we have a list of the 100 most common values for P31 "instance of"

Open Question:

  • is it feasible and desirable to walk up the ontological tree here?
    • On the one hand, many buildings are probably tagged with something more specific than Q41176 "building", for example https://www.wikidata.org/wiki/Q105700993 has "instance of 'School building'", but for the purpose of this task, it would still be useful to know how many of the Items with sitelinks are buildings
      • There are 264.777 direct instances of "building" on Wikidata (source query), but 2.786.448 Items when instances of subclasses are included (source query)
    • On the other hand, further up the chain of instance of/subclass of, there is for example artificial physical object which likely would be less informative here