[Task] Contemporary Constraint check
Open, NormalPublic

Description

Implement a check on Wikidata for Contemporary constraint.

Proposed assertions (if true, there isn't a constraint violation; if false, there is a constraint violation):

  • Text: if [item A] has this property (Pn) linked to [item B], then [item A] and [item B] have to coincide or coexist at some point of history according to properties P569 (date of birth), P570 (date of death), P571 (inception), P576 (dissolved or abolished), P580 (start time) and P582 (end time).
  • Formally:
(
    (
        A.P580 ≤ B.P582 OR
        A.P580 ≤ B.P570 OR
        A.P580 ≤ B.P576 OR
        A.P569 ≤ B.P582 OR
        A.P569 ≤ B.P570 OR
        A.P569 ≤ B.P576 OR
        A.P571 ≤ B.P582 OR
        A.P571 ≤ B.P570 OR
        A.P571 ≤ B.P576
    ) OR (
        NOT EXISTS A.P580 AND
        NOT EXISTS A.P569 AND
        NOT EXISTS A.P571
    ) OR (
        NOT EXISTS B.P582 AND
        NOT EXISTS B.P570 AND
        NOT EXISTS B.P576
    )
) AND (
    (
        B.P580 ≤ A.P582 OR
        B.P580 ≤ A.P570 OR
        B.P580 ≤ A.P576 OR
        B.P569 ≤ A.P582 OR
        B.P569 ≤ A.P570 OR
        B.P569 ≤ A.P576 OR
        B.P571 ≤ A.P582 OR
        B.P571 ≤ A.P570 OR
        B.P571 ≤ A.P576
    ) OR (
        NOT EXISTS B.P580 AND
        NOT EXISTS B.P569 AND
        NOT EXISTS B.P571
    ) OR (
        NOT EXISTS A.P582 AND
        NOT EXISTS A.P570 AND
        NOT EXISTS A.P576
    )
)

It would be great that the set of "properties of start" and the set of "properties of end" could be modified somehow over time.

abian created this task.Aug 2 2016, 10:21 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptAug 2 2016, 10:21 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
abian updated the task description. (Show Details)Aug 21 2016, 11:05 AM
abian added a comment.Sep 2 2016, 11:33 PM

Example of query that detects Constraint:Contemporary violations:

SELECT ?subject ?value
WHERE {
  ?subject wdt:P${THEPROPERTY} ?value .
  OPTIONAL { ?subject p:P569/psv:P569 [ wikibase:timeValue ?subject_birth ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P569/psv:P569 [ wikibase:timeValue ?value_birth ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P571/psv:P571 [ wikibase:timeValue ?subject_inception ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P571/psv:P571 [ wikibase:timeValue ?value_inception ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P580/psv:P580 [ wikibase:timeValue ?subject_start ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P580/psv:P580 [ wikibase:timeValue ?value_start ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P570/psv:P570 [ wikibase:timeValue ?subject_death ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P570/psv:P570 [ wikibase:timeValue ?value_death ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P576/psv:P576 [ wikibase:timeValue ?subject_dissolution ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P576/psv:P576 [ wikibase:timeValue ?value_dissolution ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?subject p:P582/psv:P582 [ wikibase:timeValue ?subject_end ; wikibase:timePrecision "11"^^xsd:integer ] . }
  OPTIONAL { ?value p:P582/psv:P582 [ wikibase:timeValue ?value_end ; wikibase:timePrecision "11"^^xsd:integer ] . }
  FILTER (
    (
      !BOUND(?subject_birth) ||
      (
        (
          !BOUND(?value_death) ||
          ?subject_birth>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) ||
          ?subject_birth>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) ||
          ?subject_birth>?value_end
        )
      )
    ) &&
    (
      !BOUND(?value_birth) ||
      (
        (
          !BOUND(?subject_death) || 
          ?value_birth>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_birth>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_birth>?subject_end
        )
      )
    ) && 
    (
      !BOUND(?subject_inception) ||
      (
        (
          !BOUND(?value_death) || 
          ?subject_inception>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) || 
          ?subject_inception>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) || 
          ?subject_inception>?value_end
        )
      )
    ) && 
    (
      !BOUND(?value_inception) || 
      (
        (
          !BOUND(?subject_death) || 
          ?value_inception>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_inception>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_inception>?subject_end
        )
      )
    ) && 
    (
      !BOUND(?subject_start) || 
      (
        (
          !BOUND(?value_death) || 
          ?subject_start>?value_death
        ) &&
        (
          !BOUND(?value_dissolution) || 
          ?subject_start>?value_dissolution
        ) &&
        (
          !BOUND(?value_end) || 
          ?subject_start>?value_end
        )
      )
    ) && 
    (
      !BOUND(?value_start) || 
      (
        (
          !BOUND(?subject_death) || 
          ?value_start>?subject_death
        ) &&
        (
          !BOUND(?subject_dissolution) || 
          ?value_start>?subject_dissolution
        ) &&
        (
          !BOUND(?subject_end) || 
          ?value_start>?subject_end
        )
      )
    ) && 
    ( 
      ( 
        BOUND(?subject_birth) && 
        ( 
          BOUND(?value_death) || 
          BOUND(?value_dissolution) || 
          BOUND(?value_end) 
        )
      ) || 
      ( 
        BOUND(?value_birth) && 
        ( 
          BOUND(?subject_death) || 
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      ) || 
      ( 
        BOUND(?subject_inception) && 
        ( 
          BOUND(?value_death) || 
          BOUND(?value_dissolution) || 
          BOUND(?value_end) 
        )
      ) || 
      ( 
        BOUND(?value_inception) &&
        ( 
          BOUND(?subject_death) ||
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      ) ||
      ( 
        BOUND(?subject_start) &&
        ( 
          BOUND(?value_death) ||
          BOUND(?value_dissolution) ||
          BOUND(?value_end) 
        )
      ) ||
      ( 
        BOUND(?value_start) &&
        ( 
          BOUND(?subject_death) ||
          BOUND(?subject_dissolution) ||
          BOUND(?subject_end) 
        )
      )
    )
  ) 
} LIMIT 500
thiemowmde triaged this task as Normal priority.
thiemowmde added a subscriber: Lydia_Pintscher.

This task was added to Need-volunteer. Could I help with it in some way?

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 22 2017, 1:08 PM
abian claimed this task.Aug 20 2017, 6:59 PM
abian updated the task description. (Show Details)

The most constraints are fully parametrized. Algorithms do not have constantly defined properties. The only exception is P31/P279 in Type/Value type constraints. Some users think that P31/P279 must be parameters too. The second thing is OR aggregation. This makes the algorithm indeterministic a bit.

So it is better to move properties like P569, P570, P571 to parameters of the constraint and remove OR aggregation. This will make the algorithm robust and simple for implementation and understanding. @abian could you create such properties and modify algorithm description above?

abian added a comment.EditedAug 22 2017, 9:03 PM

The most constraints are fully parametrized. Algorithms do not have constantly defined properties. The only exception is P31/P279 in Type/Value type constraints. Some users think that P31/P279 must be parameters too. The second thing is OR aggregation. This makes the algorithm indeterministic a bit.

So it is better to move properties like P569, P570, P571 to parameters of the constraint and remove OR aggregation. This will make the algorithm robust and simple for implementation and understanding. @abian could you create such properties and modify algorithm description above?

Thanks for the feedback, @Ivan_A_Krestinin! The boolean expression is only an example of formalization with six properties but, indeed, these properties should be modifiable.

I think it would be a better option to have two sets of properties, start and end, and to work only with the minimum value of the defined start statements and the maximum value of the defined end statements (which is equivalent in logical terms to the description). Having these two sets centralized in https://www.wikidata.org/wiki/Q25796498 (defining them independently, or defining only the parent properties, P580 and P582) seems to me simpler and less error-prone than specifying several applicable properties (for start of item, for end of item, for start of value and for end of value) for every single property with this constraint.

What do you think about this option?

All other constraints store settings as qualifiers on property page. I think it is good practice make all constraints as similar as possible. This makes implementation and understanding of full constraints system easier.

abian added a comment.EditedAug 24 2017, 3:59 PM

All other constraints store settings as qualifiers on property page. I think it is good practice make all constraints as similar as possible. This makes implementation and understanding of full constraints system easier.

This is actually a similar case to the one of P31/P279, and not so similar to the rest of constraints: we don't want to define information about the properties having the constraint, but about what property in Wikidata represents something.

We have two trivial parent properties, P580 and P582, whose URIs are, presumably, stable and won't change, as if they were P31 and P279. Knowing P580 and P582, and knowing that P1647 is the equivalent of P279 for properties, we can also load all the corresponding subproperties (two sets that can change, generally grow up). This keeps a correct ontology, avoids duplicating a lot of information (repeating the same qualifiers between properties, repeating which the subproperties of P580 and P582 are, etc.), avoids inconsistencies, makes maintaining the constraint easier and saves some effort to the community by not having to update anything related to the constraint when a subproperty of P580 or P582 is created, defined or deprecated.

(Aside: that first query should probably include “work period (start)”. It seems to be missing a P​279 statement.)

I’m not convinced that this is the best approach. Do we really need all the properties for every constraint? I feel like it might make more sense to define the properties to be used on the constraint, and to limit them to a smaller set: for instance, perhaps a contemporary constraint check for “spouse” should only take into account “date of birth” and “date of death”, not “service entry” or “date of official closure”.

abian added a comment.EditedAug 24 2017, 5:17 PM

(Aside: that first query should probably include “work period (start)”. It seems to be missing a P​279 statement.)

Uh-huh! Added. Thanks!

I’m not convinced that this is the best approach. Do we really need all the properties for every constraint? I feel like it might make more sense to define the properties to be used on the constraint, and to limit them to a smaller set: for instance, perhaps a contemporary constraint check for “spouse” should only take into account “date of birth” and “date of death”, not “service entry” or “date of official closure”.

Indeed, but, on the one hand, this constraint is applied to many different properties, even P31 and P279. On the other hand, I see no undesirable effects when considering more subproperties than necessary if the difference in computational cost is negligible (no idea for now, but it shouldn't be huge). By defining what subproperties should be considered for the constraint for each property, we are also introducing redundancies and potential inconsistencies with other constraints, mainly with conflicts-with constraint, apart from all the problems mentioned above.

An algorithm that gives us the same outputs as another in almost the same time, using nearly the same amount of memory, and with fewer inputs is a better algorithm. If the difference in computational cost wasn't relevant here (no idea if it will finally be or not) and, in addition, the algorithm with more inputs (qualifiers) brings us all the considered problems, I have clear which is the best option.

But consensus should decide, not my point of view... 😇

Okay, those are some good points :) I’m still skeptical about one thing:

An algorithm that gives us the same outputs as another in almost the same time

Do we actually have an efficient way to find subproperties of a property without using the query service? Because finding the subproperties via SPARQL seems to take about 80 ms from within the Eqiad cluster (tested from a Cloud VPS server), which isn’t that cheap IMHO. Is there something within Wikibase to find statements with a particular value? (Special:WhatLinksHere, I suppose…)

abian added a comment.Aug 25 2017, 5:34 PM

Do we actually have an efficient way to find subproperties of a property without using the query service?

No idea. But couldn't we retrieve those subproperties asynchronously, from time to time? They aren't expected to change very often.