To better understand the patterns of what Wikidata Items are connected to articles/pages on other Wikimedia wikis, we would like to get an overview of Items with/without sitelinks grouped by their ontological properties, especially P31 "instance of".
Acceptance criteria:
- for Items with sitelinks we have a list of the 100 most common values for P31 "instance of"
- for Items without sitelinks we have a list of the 100 most common values for P31 "instance of"
Open Question:
- is it feasible and desirable to walk up the ontological tree here?
- On the one hand, many buildings are probably tagged with something more specific than Q41176 "building", for example https://www.wikidata.org/wiki/Q105700993 has "instance of 'School building'", but for the purpose of this task, it would still be useful to know how many of the Items with sitelinks are buildings
- There are 264.777 direct instances of "building" on Wikidata (source query), but 2.786.448 Items when instances of subclasses are included (source query)
- On the other hand, further up the chain of instance of/subclass of, there is for example artificial physical object which likely would be less informative here
- On the one hand, many buildings are probably tagged with something more specific than Q41176 "building", for example https://www.wikidata.org/wiki/Q105700993 has "instance of 'School building'", but for the purpose of this task, it would still be useful to know how many of the Items with sitelinks are buildings