Page MenuHomePhabricator

Ignore deprecated statements when checking “type” and “value type” constraints
Closed, ResolvedPublic8 Estimated Story Points

Description

Problem:
The type and value type constraints partially treat deprecated statements like any other statement. We want to change this. If a Property requires a certain type on an Item, and a matching “instance of” or “subclass of” statement exists but is deprecated a constraint violation should be raised.

Example:
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive/2021/03#Constraint_not_working

BDD
GIVEN a Property with a "type" constraint definition
AND an Item with an "instance of" or "subclass of" statement for that Property satisfying the constraint
AND the value is deprecated for that statement
AND no other value that satisfies the constraint exists
THEN a constraint violation is triggered

GIVEN a Property with a "value type" constraint definition
AND a statement covered by that constraint
AND the Item referenced in the statement's value has an "instance of" or "subclass of" statement satisfying the constraint
AND the value is deprecated for that statement
AND no other value that satisfies the constraint exists
THEN a constraint violation is triggered

Acceptance criteria:

  • "type" and "value type" constraint checks ignore deprecated values when being checked

Original report:
If a property requires a certain type on an item, and a matching “instance of” or “subclass of” statement exists but is deprecated… should that be reported as a constraint violation?
(Note: the current implementation ignores rank when checking the type in PHP but then uses wdt: triples in the SPARQL fallback, so one of the two definitely needs to be updated.)

Event Timeline

Lydia_Pintscher renamed this task from Ignore deprecated statements when checking “type” and “value type” constraints? to Ignore deprecated statements when checking “type” and “value type” constraints.Apr 27 2021, 7:51 AM
Lydia_Pintscher updated the task description. (Show Details)

Change 685826 had a related patch set uploaded (by Bereket teshome; author: Bereket teshome):

[mediawiki/extensions/WikibaseQualityConstraints@master] Ignore deprecated statements when checking 'type' and 'value type' constraints

https://gerrit.wikimedia.org/r/685826

What if there are normal-rank and preferred-rank “subclass of” statements on the same item? The query service implementation would only use the preferred-rank statements – should the PHP version do the same?

Change 685826 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Ignore deprecated statements when checking 'type' and 'value type' constraints

https://gerrit.wikimedia.org/r/685826

What if there are normal-rank and preferred-rank “subclass of” statements on the same item? The query service implementation would only use the preferred-rank statements – should the PHP version do the same?

Hmmm I'd say a normal ranked statement should still count for the purpose of the constraint check. So I guess we should adapt both to only ignore deprecated?

I don’t think that’s possible in SPARQL…

I don’t think that’s possible in SPARQL…

I feared you'd say this :D
In this case I guess we should use all statements for the fallback SPARQL option. Only taking into account best ranked statements will likely lead to too many false positives.

But I worry that replacing wdt:P279 with (p:P279/ps:P279) will also degrade performance significantly.

If best-rank statements don’t give the right result, shouldn’t that be an incentive for users to fix the subclass tree? After all, most other queries will presumably also use wdt:P279. (In the 2018-02-26 SPARQL logs, I count 542 queries using p:P279 against 251338 queries using wdt:P279 in the “organic” set, or 1485 p:P279 queries against 10112191 wdt:P279 queries in the “all” set.)

But I worry that replacing wdt:P279 with (p:P279/ps:P279) will also degrade performance significantly.

If best-rank statements don’t give the right result, shouldn’t that be an incentive for users to fix the subclass tree? After all, most other queries will presumably also use wdt:P279. (In the 2018-02-26 SPARQL logs, I count 542 queries using p:P279 against 251338 queries using wdt:P279 in the “organic” set, or 1485 p:P279 queries against 10112191 wdt:P279 queries in the “all” set.)

*nod*
Yeah that's a good point. Ok then let's do it as you say.
Is this currently as it is in the code or does that still need changing?

As far as I can tell, we currently use best-rank P279 statements in the SPARQL part of type checking, but non-deprecated-rank P279 statements in the PHP part. I think we should adjust the PHP code so it also only uses best-rank statements.

Change 702092 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Use best-rank statements in TypeCheckerHelper

https://gerrit.wikimedia.org/r/702092

Change 702092 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Use best-rank statements in TypeCheckerHelper

https://gerrit.wikimedia.org/r/702092