Overview
Revisit what an item quality model would look like for Wikidata. Use cases / motivations:
- Structured data gap metrics for Wikipedia articles
Open Questions
- What is the scope? Just items with Wikipedia sitelinks? All items? Some other subset perhaps based on instance-of properties?
- What are the drawbacks of the existing ORES model (see more below) that we can try to address?
- How might we get labeled data to help evaluate any resulting models?
- How can we estimate what properties are missing for an item (completeness)? Based on similar instance-ofs? Based on info in corresponding Wikipedia articles? Using Schemas? Something else?
- What are the other use-cases associated with this model?
- What features might we use beyond counts of properties/values/references? Embeddings?
- Can we / should we generate weights for different properties -- i.e. are certain statements "more important" than others to the quality of an item?
Resources
- Past ORES approach: https://www.wikidata.org/wiki/Wikidata:Item_quality and features related to quantity and other aspects of quality.