Page MenuHomePhabricator

Allow restricting constraints to certain entity types
Closed, ResolvedPublic8 Estimated Story Points

Description

As an editor, I want to restrict the entity type to which a constraint applies in order to avoid false constraint violations.

Problem:
It is currently not possible to restrict the entity type (Item, Property, Lexeme, Form, Sense) to which a constraint should apply. This would be useful because a Property is sometimes used in different contexts in the different entity types.

Example:
https://www.wikidata.org/w/index.php?title=Property:P443&oldid=1316228429#P443$8116725c-4f59-8b08-31bc-99eeb5dd52df should be checked on Items but not on Lexemes

BDD
GIVEN a constraint definition
WHEN it includes a restriction on the entity type it applies to
THEN no constraint violations are triggered on the excluded entity types

Format:

  • a separate configuration variable for the “constraint scope (entity)” property, but using the same default as the existing “scope” parameter (P4680)
  • allowed values are the same as for the “allowed entity types” constraint

Acceptance criteria:

  • no constraint violations are triggered on entity types that are excluded by the constraint definition
  • warnings for invalid “scope” parameters include the newly allowed values (item, property, etc.) in the warning message (“X is not a valid value, must be one of…”)
  • the maintenance script to import constraint entities doesn’t break when two of the variables default to the same property

Open questions:

  • Should we exclude certain entity types or include? So should we have a allow or deny list?
    • -> allow list (listing the types where the constraint should be checked), this matches how constraint scope works right now

Event Timeline

I don’t think this belongs under T213803: [Tracking] Request for new constraint types – it’s not really a new constraint type, but rather a new kind of constraint metadata (closest to constraint scope, and in fact we could reuse that property) that could apply to any constraint type.

Lydia_Pintscher added a subscriber: abian.

Would love to hear people's thoughts on the open questions in the task description.

Should we exclude certain entity types or include? So should we have a allow or deny list?

I would prefer listing the types where the constraint should be checked, which also matches how constraint scope works right now.

Should we reuse the scope parameter?

That would make sense to me since they both cover where to check run the checks, but it might also make it harder to model things in a useful way.

For example: What happens if someone adds multiple values? e.g. "constraint scope: as main value, as reference, on item, on lexeme" - I think the expected interpretation would be (main value || reference) && (item || lexeme) and not a simple "and" or "or" of all the values. Or if you want to say something like "as main value on lexemes or as qualifier on items"?

Should we reuse the scope parameter?

That would make sense to me since they both cover where to check run the checks, but it might also make it harder to model things in a useful way.

For example: What happens if someone adds multiple values? e.g. "constraint scope: as main value, as reference, on item, on lexeme" - I think the expected interpretation would be (main value || reference) && (item || lexeme) and not a simple "and" or "or" of all the values.

That would be my interpretation as well.

Or if you want to say something like "as main value on lexemes or as qualifier on items"?

If this is really needed, it could still be modeled by making two constraint statements, with the same constraint type and other qualifiers, but different scope. (But I would expect this to be rare.)

I understand the motivation (thanks to the fact that Nikki's tasks are much more interesting and better described than mine), :-) but I believe we should strive to contain the complexity of the constraint system and, if possible, reduce the current complexity, which is quite high. The need that the example represents might not be recurrent and, for that case, I think we could have two Properties, one for Items and one for Forms, to better adjust the constraints (not only the one we're commenting on) and statements of each of them to their cases of use. I wouldn't find it a big problem if there were some similar statements on two different Properties, as they aren't expected to change frequently and they'll be used in different namespaces (so both shouldn't appear together or be read by the same software agents that might know about one Property but not about the other). Please don't hate me for this (hate me for something else…).

I understand the motivation (thanks to the fact that Nikki's tasks are much more interesting and better described than mine), :-)

I can't take the credit for the task description, that's Lydia's work. :)

but I believe we should strive to contain the complexity of the constraint system and, if possible, reduce the current complexity, which is quite high. The need that the example represents might not be recurrent and, for that case, I think we could have two Properties, one for Items and one for Forms, to better adjust the constraints (not only the one we're commenting on) and statements of each of them to their cases of use. I wouldn't find it a big problem if there were some similar statements on two different Properties, as they aren't expected to change frequently and they'll be used in different namespaces (so both shouldn't appear together or be read by the same software agents that might know about one Property but not about the other). Please don't hate me for this (hate me for something else…).

I don't think it would be a good idea to split this property. It serves exactly the same purpose in both places - to link to a file containing the pronunciation of a word - and multiple almost identical properties makes it harder for people to use the right one in the right place. It already took me a long time to stop accidentally using the "audio" property instead of "pronunciation audio".

The property constraints themselves don't seem that complex to me. The main problem I have is that the way we model/describe them is quite abstract and technical and I can never remember exactly which properties/values I need to use. Most people should never need to touch property constraints though.

I can't take the credit for the task description, that's Lydia's work. :)

What a shame, what a disappointment…

I don't think it would be a good idea to split this property. It serves exactly the same purpose in both places - to link to a file containing the pronunciation of a word - and multiple almost identical properties makes it harder for people to use the right one in the right place. It already took me a long time to stop accidentally using the "audio" property instead of "pronunciation audio".

The property constraints themselves don't seem that complex to me. The main problem I have is that the way we model/describe them is quite abstract and technical and I can never remember exactly which properties/values I need to use. Most people should never need to touch property constraints though.

Actually, only 27% of those active Wikidata users who decided to answer the survey on Property constraints (and who knew what Property constraints were) said that the system was "relatively easy" to use. Even considering only these users, by "complexity" I also mean the number of decisions the system requires us to consider.

If the constraint system consisted only of adding or not adding a constraint per Property without qualifiers (only one constraint type available), there would only be one dichotomous decision to make for each Property, users would be aware of the two possible options and could spend time considering both to make the best decision. The number of decisions would be the number of Properties (~8260). This isn't a small number to begin with but can be addressed by all of us.

If we had 25 constraint types without values or qualifiers (that is, the decision remains simply whether or not to add the constraint), there would be 8260*25=206500 decisions to be made. Here efforts are concentrated on a minority of Properties and the constraint types that are remembered, so the error of omission is introduced, it's not known whether certain constraints are missing because they aren't applicable or because they haven't yet been considered, some of the cases considered when there was only one constraint type are no longer considered and each constraint type receives several (up to ~25) times less attention on average. From this supersimplified scenario onwards, each qualifier that is possible to include and each value that needs to be specified causes a new combinatorial explosion in the number of decisions ("complexity"), increases the number of omissions and forces high-impact decisions to receive less attention, as the features that are recalled or well specified and the features that have the greatest impact in each case don't necessarily coincide.

To cover a single case or a small set of them, the number of possibilities this qualifier would introduce, which could be specified for any constraint type, would be disproportionate, including inconsistencies such as specifying a set of entity types that violate those of the allowed entity types constraint, even specifying the entity types for an allowed entity types constraint. Perhaps to solve the problem you indicate with the different Properties ("pronunciation audio for this Item" or "pronunciation audio for this Form", for example) we only need allowed entity types constraints to be taken into account in the web interface (Properties that are not applicable to an entity type shouldn't be suggested for that entity type). Also according to https://www.wikidata.org/wiki/Wikidata:2020_report_on_Property_constraints#allowed_entity_types:

This constraint type is the third with the highest proportion of mandatory constraints (48%), only after the Commons link and Property scope constraint types. Consistently, it has no constraints with the suggestion level and no exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.

To reduce complexity for users (number of decisions they have to make), it would also be nice to address T244050 and remove from our sight those constraint types that don't make sense considering the Property type.

All this talk is just my opinion, but I wanted to explain what I meant by "complexity", because I recognize that it was very ambiguous. Don't stop implementing something just because it doesn't look promising to me if the arguments don't convince you…

Next example: Pinyin transliteration should have a property scope of qualifier on items and lexemes and main property on forms. Since we can't do that, it now says the scope is main property or qualifier everywhere.

And the next one: transliteration has some constraints for what type of items it can be used on. Those constraints apply to usage on items and do not make sense applied to lexemes. Since I can't restrict those constraints to items, the only way I could resolve the constraint violations they were creating was to remove the constraints entirely.

And another: reading pattern of Han character should be qualifier on items and main value on forms, not main value or qualifier everywhere.

sponsor had an "item requires statement" constraint which should only apply to items, not media files.

I think it's an error to use Wikidata constraints for Commons as such. Constraints are defined for Wikidata, not any possible other Wikibase using Wikidata.

Same problem for "allowed entity types constraint (Q52004125)" : "Wikibase MediaInfo (Q59712033)" which is reported on Commons.

Should we reuse the scope parameter?

I guess one solution to this would be that we implement this parameter with a separate configuration variable, for a second property ID, but then use the same property ID as for the existing scope parameter (P4680) as the default, so that on Wikidata, the same property is used for both. Then, if we want to have two different properties after all, we’d just have to change the configuration to use a different property ID, without changing the code.

Thank you, Lucas, let's make it so!

Addshore set the point value for this task to 8.Aug 4 2021, 10:27 AM

Change 713888 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Introduce ConstraintChecker::getSupportedEntityTypes()

https://gerrit.wikimedia.org/r/713888

Change 714072 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::parseItemIdsParameter() helper

https://gerrit.wikimedia.org/r/714072

Change 714073 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::mapItemId() helper

https://gerrit.wikimedia.org/r/714073

Change 714074 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Extract ConstraintParameterParser mappings into methods

https://gerrit.wikimedia.org/r/714074

Change 714075 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add constraint scope for entity types

https://gerrit.wikimedia.org/r/714075

Change 714076 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Avoid parsing the same parameters twice

https://gerrit.wikimedia.org/r/714076

Another one to fix once the constraint is available: The property scope constraint for pronunciation variety should be split into two, with "qualifier" for item and form, and "main value" for mediainfo.

Change 715002 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Unify terminology around allowed/valid types

https://gerrit.wikimedia.org/r/715002

Another to fix: The property scope constraint for ALA-LC romanisation should be split, with "main value" for forms and "qualifier" for items.

Change 713888 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Introduce ConstraintChecker::getSupportedEntityTypes()

https://gerrit.wikimedia.org/r/713888

Change 714072 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::parseItemIdsParameter() helper

https://gerrit.wikimedia.org/r/714072

Change 714073 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::mapItemId() helper

https://gerrit.wikimedia.org/r/714073

Change 714074 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Extract ConstraintParameterParser mappings into methods

https://gerrit.wikimedia.org/r/714074

Change 714075 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add constraint scope for entity types

https://gerrit.wikimedia.org/r/714075

Change 715002 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Unify terminology around allowed/valid types

https://gerrit.wikimedia.org/r/715002

Michael removed a project: Patch-For-Review.

The required functionality is now implemented. The remaining open patch will be better attached to T290142.

Change 714076 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/WikibaseQualityConstraints@master] Avoid parsing the same parameters twice

Reason:

https://gerrit.wikimedia.org/r/714076