Wikidata statistics: create a tool or WQS function to evaluate completeness of items of a given type or group of items
Open, Needs TriagePublic


To evaluate the completeness of items, there are currently several lists at Wikidata:

Measuring could be done:

  • 1. compared to properties present on other items in the same result. @Magnus 's tool "Related Properties" used to work well for that. See samples b. and c. above.
  • 2. compared to a theoretical list included in the query. See samples a., d., and e. above.
  • 3. compared to a theoretical list defined in properties for this type (P1963). Given that the property isn't used much, this might correspond to #1 or #2
  • 4. compared to the percentage of uses in a larger set (e.g. items for politicians compared to items for people in general) or a comparable set (e.g. US senators compared to US presidents). Complete = same percentage of that of the comparable group.
  • 5. (maybe) compared to some completeness marker
  • 6. etc . [ please add more ]

Ideally this would go beyond an empty / non empty cell in a table, but not be based on a 1-by-1 manual evaluation of completeness of items. T145531, T127475 cover other aspects.

Esc3300 created this task.Nov 6 2016, 7:03 AM
Restricted Application added a project: Discovery. · View Herald TranscriptNov 6 2016, 7:03 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Esc3300 updated the task description. (Show Details)Nov 6 2016, 7:09 AM
Esc3300 updated the task description. (Show Details)

I think they try to do it manually.

Esc3300 updated the task description. (Show Details)Nov 6 2016, 12:44 PM

I think we can close this with recoin and handle enhancements in separate tickets?

Maybe. To assess, can we see equivalent queries for the samples above?

Ls1g added a subscriber: Ls1g.EditedJun 15 2017, 1:49 PM

Would it be of interest to have a session at the WikidataCon in October to discuss observations, ideas and requirements like mentioned in this ticket and in ticket T150938?

As far as I see there is no one-size-fits it all solution for completeness, but it could be interesting to discuss the various aspects (absolute completeness of subject-property pairs like in COOL-WD, relative completeness in comparison with other entities like in Recoin, more constrained relative completeness like examples d) and e) above), and to collect ideas how this could be moved forward.

Let me know what you think about this, if there is interest I would create a proposal.

@Esc3300: Recoin can't handle really handle these cases yet as it simply looks at the frequency of properties, but we have an adaption under development that allows to specify exactly which properties should be compared.

I think solution #3 could work out. One just would need to specify which ones are mandatory.