To understand the distribution of quality in Wikidata, we need to improve upon the existing ORES quality datasets by joining in even more information than re-use statistics and the number of items per quality class. In particular, we need to understand the distribution of the ORES quality scores across the content in Wikidata. To establish such a distribution, we will be joining in data on set relations and mereological relations from the Wikidata JSON dump to the ORES quality prediction scores that we already produce and use in our analytics.
The goal is to provide a set of actional insights that could be shared with the community on what classes are critical in terms of item quality and where the improvements are necessary. We hope to be also able to derive a more strategic insight into the possible future evolution of item quality in Wikidata given its current state that we want to establish in this ticket.
Based on the Wikidata ORES Quality Report in Wikidata Analytics:
- update the ORES quality prediction scores,
- update the Wikidata ORES Quality Report in Wikidata Analytics,
- get all P31, P279, and P361 classes of the items for which we have ORES prediction scores,
- establish the quality score distributions per class,
Acceptance criteria:
We have the first inputs for our next sensemaking session (where we will decide on the next steps):
- first exploratory data analysis
- including a shareable dataset
- ideas about possible next steps (towards a better understanding of the current distribution of the ORES quality scores across Wikidata’s classes)
Next iteration: