Page MenuHomePhabricator

analyze constraint violations
Open, Needs TriagePublic

Description

Constraints are an integral part of keeping Wikidata's data quality high. We need to understand them more in order to see how they can be improved further.

@abian is looking into how constraints are defined and used right now on Wikidata: https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Abi%C3%A1n/Study_on_Wikidata_property_constraints

As part of this task we want to better understand the constraint violations - so the cases where data does not conform with the constraint definition. Among others we want to learn more about:

  • How many violations are there? How does it develop over time?
  • Are violations clustered around certain datatypes or properties?
  • Do violations get fixed? By hand or mass-edits?
  • What percentage of violations are false alarms (e.g. exceptions or bad constraint definition) and which ones really should be fixed (e.g. legitimate error in the data)?

Notes