Page MenuHomePhabricator

Value type constraints show errors even when they shouldn't
Closed, ResolvedPublic

Description

I realized there can be a problem in the constraint reports that happens for example with all the entities with P449. You can see that the constraint report for Q13407659 detects a warning for the property P449 saying that Q2006738 isn't an instance or a subclass of Q15265344, but it is. You just have to follow the tree and go through it to see that it's an instance of Q1616075, a subclass of 15943455 and finally a subclass of Q15265344. I understand there has to be a limit in the depth of analysis to prevent performance problems, but if the problem is that the limit is reached I don't think the specified depth could be enough for many valid cases.

Event Timeline

Agabi10 created this task.May 26 2017, 11:33 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptMay 26 2017, 11:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
abian added a subscriber: abian.May 28 2017, 7:38 PM

This issue does not seem to be related to a depth limit, but to the fact that Q1616075 had Q15943455 as its second value for subclass of (P279), not as its first one. I have just swapped both values and, now, there are no violations.

This behavior is actually a bug, and must be fixed.

thiemowmde triaged this task as Low priority.May 30 2017, 10:01 AM
thiemowmde added a project: patch-welcome.

We replaced the depth limit with a limit on total entities visited a few weeks ago (0408a71186e3); the ticket for it, T164948, isn’t public yet, but the test commit 6f24ded76ea4 has a longer explanation for the issue.

We noticed pretty soon that the new limit was too low and increased it in 7ac876abcd96; that change should be deployed soon, which will hopefully resolve this issue.

We’re also working on an alternative type checker that uses SPARQL, which will have no depth limit (it will be subject to the WDQS timeout, but that should rarely be an issue).

abian added a comment.EditedMay 30 2017, 12:15 PM

We replaced the depth limit with a limit on total entities visited a few weeks ago (0408a71186e3); the ticket for it, T164948, isn’t public yet, but the test commit 6f24ded76ea4 has a longer explanation for the issue.
We noticed pretty soon that the new limit was too low and increased it in 7ac876abcd96; that change should be deployed soon, which will hopefully resolve this issue.

Would it be convenient to save a list of explored nodes so that they can't be expanded more than once (if they are in the list, they are skipped)?

I think the new limit has been deployed (I see type checking spikes of 4 s and 7½ s in the last 12 h on Grafana), can you try it out? (Perhaps unswap the classes if you don’t have any other item where the test previously failed?)

@abian that would be possible, but I don’t want to add too many optimizations like that because eventually we would just be building a shittier, slower BlazeGraph :) the SPARQL change has already been merged, so for now I’d rather focus on integrating that into the regular “type” and “value type” checkers so that false negatives like these should no longer occur.

@Lucas_Werkmeister_WMDE I think it is working now, but I think there may be a problem still in how the value type constraint is checked (or with my understanding of what a type is). It still displays an error when the entity instead of being an instance of a given entity is a subclass of a given entity, anyway, in this case I don't know if the problem is with the constraint itself or with my understanding, but for me if a cars are a subclass of vehicles they are still vehicles (as well as all the instances of the car class).

One example of this behavior is Q1616075, which from my understanding it shouldn't be triggering an error for the Type constraint due to being a subclass of a subclass of organization.

Anyway if it's just a problem of my understanding of what a type is feel free to close the task as resolved.

@Agabi10 I think that’s working as intended – “instance of” and “subclass of” are different things and, in general, shouldn’t be used interchangeably. If some people think that “instance of” should be used and add a constraint to reflect that, and other people think that “subclass of” should be used and add statements with that property, then those people should get in touch and have a discussion about the discrepancy. If the constraint violations make such disagreements more obvious, and lead to more such discussions, that’s a good thing in my opinion :)

In the case of Q1616075, the constraint violation is certainly technically correct, and looking at P​452 – “industry of company or organization”, “Wikidata property for items about organizations” – I would say that it’s also appropriate: the property seems intended only for individual organizations, not classes of them.

Change 358055 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Inject use SparqlHelper in TypeCheckerHelper

https://gerrit.wikimedia.org/r/358055

OK @Lucas_Werkmeister_WMDE, so if that's intended I think it works as expected. Thanks.

Change 358055 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Inject use SparqlHelper in TypeCheckerHelper

https://gerrit.wikimedia.org/r/358055

Lucas_Werkmeister_WMDE closed this task as Resolved.Jun 12 2017, 10:18 AM
Lucas_Werkmeister_WMDE claimed this task.

Closing now, feel free to open another ticket if something goes wrong.

@Lucas_Werkmeister_WMDE It's still triggering the error for the author property of Q6012487 and probably many others. I suppose it's the same problem, so I don't know if it would be more appropriate reopening this bug or creating a new one.