Page MenuHomePhabricator

[Task] Contemporary constraint check
Closed, ResolvedPublic

Description

Implement the contemporary constraint in Wikidata.

It would be great that the set of "properties of start" and the set of "properties of end" could be modified somehow over time.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Example of query that detects Constraint:Contemporary violations:

SELECT ?subject ?value
WHERE {
  ?subject wdt:P${THEPROPERTY} ?value .
  OPTIONAL { ?subject p:P569/psv:P569 [ wikibase:timeValue ?subject_birth ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P569/psv:P569 [ wikibase:timeValue ?value_birth ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P571/psv:P571 [ wikibase:timeValue ?subject_inception ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P571/psv:P571 [ wikibase:timeValue ?value_inception ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P580/psv:P580 [ wikibase:timeValue ?subject_start ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P580/psv:P580 [ wikibase:timeValue ?value_start ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P570/psv:P570 [ wikibase:timeValue ?subject_death ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P570/psv:P570 [ wikibase:timeValue ?value_death ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P576/psv:P576 [ wikibase:timeValue ?subject_dissolution ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P576/psv:P576 [ wikibase:timeValue ?value_dissolution ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P582/psv:P582 [ wikibase:timeValue ?subject_end ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P582/psv:P582 [ wikibase:timeValue ?value_end ; wikibase:timePrecision "11"^^xsd:integer ] . }
  FILTER (
    (
      !BOUND(?subject_birth) ||
      (
        (
          !BOUND(?value_death) ||
          ?subject_birth>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) ||
          ?subject_birth>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) ||
          ?subject_birth>?value_end
        )
      )
    ) &&
    (
      !BOUND(?value_birth) ||
      (
        (
          !BOUND(?subject_death) || 
          ?value_birth>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_birth>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_birth>?subject_end
        )
      )
    ) && 
    (
      !BOUND(?subject_inception) ||
      (
        (
          !BOUND(?value_death) || 
          ?subject_inception>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) || 
          ?subject_inception>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) || 
          ?subject_inception>?value_end
        )
      )
    ) && 
    (
      !BOUND(?value_inception) || 
      (
        (
          !BOUND(?subject_death) || 
          ?value_inception>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_inception>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_inception>?subject_end
        )
      )
    ) && 
    (
      !BOUND(?subject_start) || 
      (
        (
          !BOUND(?value_death) || 
          ?subject_start>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) || 
          ?subject_start>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) || 
          ?subject_start>?value_end
        )
      )
    ) && 
    (
      !BOUND(?value_start) || 
      (
        (
          !BOUND(?subject_death) || 
          ?value_start>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_start>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_start>?subject_end
        )
      )
    ) && 
    ( 
      ( 
        BOUND(?subject_birth) && 
        ( 
          BOUND(?value_death) || 
          BOUND(?value_dissolution) || 
          BOUND(?value_end) 
        )
      ) || 
      ( 
        BOUND(?value_birth) && 
        ( 
          BOUND(?subject_death) || 
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      ) || 
      ( 
        BOUND(?subject_inception) && 
        ( 
          BOUND(?value_death) || 
          BOUND(?value_dissolution) || 
          BOUND(?value_end) 
        )
      ) || 
      ( 
        BOUND(?value_inception) &&
        ( 
          BOUND(?subject_death) ||
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      ) ||
      ( 
        BOUND(?subject_start) &&
        ( 
          BOUND(?value_death) ||
          BOUND(?value_dissolution) ||
          BOUND(?value_end) 
        )
      ) ||
      ( 
        BOUND(?value_start) &&
        ( 
          BOUND(?subject_death) ||
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      )
    )
  ) 
} LIMIT 500
thiemowmde triaged this task as Medium priority.Sep 5 2016, 2:58 PM
thiemowmde added a project: patch-welcome.
thiemowmde added a subscriber: Lydia_Pintscher.

This task was added to patch-welcome. Could I help with it in some way?

abian updated the task description. (Show Details)

The most constraints are fully parametrized. Algorithms do not have constantly defined properties. The only exception is P31/P279 in Type/Value type constraints. Some users think that P31/P279 must be parameters too. The second thing is OR aggregation. This makes the algorithm indeterministic a bit.

So it is better to move properties like P569, P570, P571 to parameters of the constraint and remove OR aggregation. This will make the algorithm robust and simple for implementation and understanding. @abian could you create such properties and modify algorithm description above?

The most constraints are fully parametrized. Algorithms do not have constantly defined properties. The only exception is P31/P279 in Type/Value type constraints. Some users think that P31/P279 must be parameters too. The second thing is OR aggregation. This makes the algorithm indeterministic a bit.

So it is better to move properties like P569, P570, P571 to parameters of the constraint and remove OR aggregation. This will make the algorithm robust and simple for implementation and understanding. @abian could you create such properties and modify algorithm description above?

Thanks for the feedback, @Ivan_A_Krestinin! The boolean expression is only an example of formalization with six properties but, indeed, these properties should be modifiable.

I think it would be a better option to have two sets of properties, start and end, and to work only with the minimum value of the defined start statements and the maximum value of the defined end statements (which is equivalent in logical terms to the description). Having these two sets centralized in https://www.wikidata.org/wiki/Q25796498 (defining them independently, or defining only the parent properties, P580 and P582) seems to me simpler and less error-prone than specifying several applicable properties (for start of item, for end of item, for start of value and for end of value) for every single property with this constraint.

What do you think about this option?

All other constraints store settings as qualifiers on property page. I think it is good practice make all constraints as similar as possible. This makes implementation and understanding of full constraints system easier.

All other constraints store settings as qualifiers on property page. I think it is good practice make all constraints as similar as possible. This makes implementation and understanding of full constraints system easier.

This is actually a similar case to the one of P31/P279, and not so similar to the rest of constraints: we don't want to define information about the properties having the constraint, but about what property in Wikidata represents something.

We have two trivial parent properties, P580 and P582, whose URIs are, presumably, stable and won't change, as if they were P31 and P279. Knowing P580 and P582, and knowing that P1647 is the equivalent of P279 for properties, we can also load all the corresponding subproperties (two sets that can change, generally grow up). This keeps a correct ontology, avoids duplicating a lot of information (repeating the same qualifiers between properties, repeating which the subproperties of P580 and P582 are, etc.), avoids inconsistencies, makes maintaining the constraint easier and saves some effort to the community by not having to update anything related to the constraint when a subproperty of P580 or P582 is created, defined or deprecated.

(Aside: that first query should probably include “work period (start)”. It seems to be missing a P​279 statement.)

I’m not convinced that this is the best approach. Do we really need all the properties for every constraint? I feel like it might make more sense to define the properties to be used on the constraint, and to limit them to a smaller set: for instance, perhaps a contemporary constraint check for “spouse” should only take into account “date of birth” and “date of death”, not “service entry” or “date of official closure”.

(Aside: that first query should probably include “work period (start)”. It seems to be missing a P​279 statement.)

Uh-huh! Added. Thanks!

I’m not convinced that this is the best approach. Do we really need all the properties for every constraint? I feel like it might make more sense to define the properties to be used on the constraint, and to limit them to a smaller set: for instance, perhaps a contemporary constraint check for “spouse” should only take into account “date of birth” and “date of death”, not “service entry” or “date of official closure”.

Indeed, but, on the one hand, this constraint is applied to many different properties, even P31 and P279. On the other hand, I see no undesirable effects when considering more subproperties than necessary if the difference in computational cost is negligible (no idea for now, but it shouldn't be huge). By defining what subproperties should be considered for the constraint for each property, we are also introducing redundancies and potential inconsistencies with other constraints, mainly with conflicts-with constraint, apart from all the problems mentioned above.

An algorithm that gives us the same outputs as another in almost the same time, using nearly the same amount of memory, and with fewer inputs is a better algorithm. If the difference in computational cost wasn't relevant here (no idea if it will finally be or not) and, in addition, the algorithm with more inputs (qualifiers) brings us all the considered problems, I have clear which is the best option.

But consensus should decide, not my point of view... 😇

Okay, those are some good points :) I’m still skeptical about one thing:

An algorithm that gives us the same outputs as another in almost the same time

Do we actually have an efficient way to find subproperties of a property without using the query service? Because finding the subproperties via SPARQL seems to take about 80 ms from within the Eqiad cluster (tested from a Cloud VPS server), which isn’t that cheap IMHO. Is there something within Wikibase to find statements with a particular value? (Special:WhatLinksHere, I suppose…)

Do we actually have an efficient way to find subproperties of a property without using the query service?

No idea. But couldn't we retrieve those subproperties asynchronously, from time to time? They aren't expected to change very often.

abian renamed this task from [Task] Contemporary Constraint check to [Task] Contemporary constraint check.Feb 2 2018, 11:04 PM
abian removed a project: patch-welcome.
abian updated the task description. (Show Details)

Update: I have finished formalizing this constraint as a part of my final degree project, and I will start implementing it at some point in the coming weeks so that this can be finished around May. I will keep you informed. :-)

abian changed the task status from Open to Stalled.Jul 2 2018, 1:25 PM
This comment was removed by abian.
abian changed the task status from Stalled to Open.Sep 6 2018, 2:51 PM

Change 458522 had a related patch set uploaded (by Abián; owner: Abián):
[mediawiki/extensions/WikibaseQualityConstraints@master] Implement contemporary constraint check

https://gerrit.wikimedia.org/r/458522

Change 458522 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Implement contemporary constraint check

https://gerrit.wikimedia.org/r/458522

Is there a place already where I can see it working?

Is there a place already where I can see it working?

I see WikibaseQualityConstraints is already updated on https://test.wikidata.org/, but maybe the config variables of the contemporary constraint should still be adapted. Otherwise, https://wikidata-constraints.wmflabs.org/ should be updated before defining these config variables.

Now deployed on Wikidata. You can find a violation for country of citizenship (P27) on http://www.wikidata.org/entity/Q628365... until someone fixes it.

Wikidata:Database_reports/Constraint_violations/P27 reports the following and dies

ERROR: Unknown constraint type: Q25796498.

Wikidata:Database_reports/Constraint_violations/P27 reports

ERROR: Unknown constraint type: Q25796498.

Right, that's a bug in KrBot2. Only its operator has access to the source code, so we sadly can't contribute to it.

Is there a place already where I can see it working?

I updated wikidata-constraints as well, but it’s probably easiest to test it on Wikidata with @abian’s example item.

List Contemporary constraint check errors http://tinyurl.com/y8pjk5ux

SELECT DISTINCT ?item ?itemLabel WHERE {
	?statement wikibase:hasViolationForConstraint wds:P27-ad2c85a5-49ba-6748-4302-77b44a268ed5 .
	?item ?p ?statement .
	FILTER( ?item NOT IN ( wd:Q4115189, wd:Q13406268, wd:Q15397819 ) ) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}

image.png (560×826 px, 101 KB)