Page MenuHomePhabricator

An "improved autofix" needed to replace non-standard statements in Wikidata, in order to keep the data model coherent
Open, Needs TriagePublic

Description

Problem:
Wikidata has difficulties in maintaining a coherent data model, viz. in assuring that the same datum is always entered with the same combination of property-value (and eventual qualifiers). First the community has to choose a certain data model for a given datum, then the community should be able to enforce this data model, through constraints (educating users to use the chosen data model, and helping to spot not-compliant statements) and autofixes (periodically replacing not-compliant statements with compliant statements). A more general treatment in https://www.wikidata.org/wiki/Wikidata:Events/Data_Quality_Days_2022/Modeling_data .

However, the present system of constraints + autofixes (https://www.wikidata.org/wiki/Template:Autofix) has many issues and limitations (see in detail https://www.wikidata.org/wiki/Wikidata_talk:Events/Data_Quality_Days_2022/Modeling_data#The_need_for_an_improved_autofix ), among which: constraints and autofixes often duplicate the same information; autofixes cannot be queried and are not so user-friendly; wrong autofixes are difficult to undo; autofixes can generate bot wars; autofixes have some significant limitations in the replacements they can perform, so that they can presently help in keeping the data model coherent only to some extent.

Thus, Wikidata probably needs an "improved autofix", which surpasses both the issues and the limitations of the present https://www.wikidata.org/wiki/Template:Autofix.

Example:

  1. working case - the community chose to always use P2868 "subject has role" for "martyr" (and subclasses): a constraint was added to P106 "occupation" and a correspondent autofix was added to the talk of P106
  2. not working case - all the items being instance of "voice type" (e.g. https://www.wikidata.org/wiki/Q27911) must evidently be values of P412 "voice type" and never of P106 "occupation"; however, in order to autofix this, all the possible voice types should be manually added as single autofixes, which is significantly problematic (= it should be possible to autofix item X + all its recursive subclasses)

more examples of not working cases in https://www.wikidata.org/wiki/Wikidata:Events/Data_Quality_Days_2022/Modeling_data#Examples

BDD
GIVEN
AND
WHEN
AND
THEN
AND

Acceptance criteria:

Open questions:

  • how exactly will autofix data (especially the most complex ones) be stored in properties?
  • who exactly will operate the bot periodically applying autofixes to items?

Suggestion:
See https://www.wikidata.org/wiki/Wikidata_talk:Events/Data_Quality_Days_2022/Modeling_data for more details.

See also
T167700: Add button to automatically fix certain constraint violations is vaguely related, although primarily concerned with inverse statements (which is a pretty different subject)