In order to understand the distribution of quality in Wikidata we need to improve upon the existing ORES quality datasets by joining in even more information than re-use statistics and number of items per quality class. In particular, we need to understand the distribution of the ORES quality scores across the content in Wikidata. In order to establish such a distribution we will be joining in data on set relations and mereological relations from the Wikidata JSON dump to the ORES quality prediction scores that we already produce and use in our analytics.
The goal is to provide a set of actional insights that could be shared with the community on what classes are critical in terms of item quality and where the improvements are necessary. We hope to be also able to derive a more strategic insight into the possible future evolution of item quality in Wikidata given its current state that we want to establish in this ticket.
Based on the [[ https://wikidata-analytics.wmcloud.org/app_direct/WD_docs/Wikidata%20Quality%20Report.nb.html | Wikidata ORES Quality Report ]] in [[ https://wikidata-analytics.wmcloud.org/app/WikidataAnalytics | Wikidata Analytics ]]:
- update the ORES quality prediction scores,
- update the Wikidata ORES Quality Report in Wikidata Analytics,
- get all P31, P279, and P361 classes of the items for which we have ORES prediction scores,
- establish the quality score distributions per class,
- perform class clustering to estabilish a broader structure of the quality distribution in Wikidata,- generate first inputs for joint sensemaking session
-- shareable dataset of quality score distributions per class
-- ideas for visualization/simplification (incl. first drafts where feasible, e.g. perform class clustering to establish a broader structure of the quality distribution in Wikidata, analyze and visualize the quality distribution).