Allow restricting constraints to certain entity types
Closed, ResolvedPublic8 Estimated Story Points
Actions

Description

As an editor, I want to restrict the entity type to which a constraint applies in order to avoid false constraint violations.

Problem:
It is currently not possible to restrict the entity type (Item, Property, Lexeme, Form, Sense) to which a constraint should apply. This would be useful because a Property is sometimes used in different contexts in the different entity types.

Example:
https://www.wikidata.org/w/index.php?title=Property:P443&oldid=1316228429#P443$8116725c-4f59-8b08-31bc-99eeb5dd52df should be checked on Items but not on Lexemes

BDD
GIVEN a constraint definition
WHEN it includes a restriction on the entity type it applies to
THEN no constraint violations are triggered on the excluded entity types

Format:

a separate configuration variable for the “constraint scope (entity)” property, but using the same default as the existing “scope” parameter (P4680)
allowed values are the same as for the “allowed entity types” constraint

Acceptance criteria:

no constraint violations are triggered on entity types that are excluded by the constraint definition
warnings for invalid “scope” parameters include the newly allowed values (item, property, etc.) in the warning message (“X is not a valid value, must be one of…”)
the maintenance script to import constraint entities doesn’t break when two of the variables default to the same property

Open questions:

Should we exclude certain entity types or include? So should we have a allow or deny list?
- -> allow list (listing the types where the constraint should be checked), this matches how constraint scope works right now

Details

Subject	Repo	Branch	Lines +/-
Avoid parsing the same parameters twice	mediawiki/extensions/WikibaseQualityConstraints	master	+7 -6
Unify terminology around allowed/valid types	mediawiki/extensions/WikibaseQualityConstraints	master	+8 -8
Extract ConstraintParameterParser mappings into methods	mediawiki/extensions/WikibaseQualityConstraints	master	+32 -45
Add constraint scope for entity types	mediawiki/extensions/WikibaseQualityConstraints	master	+401 -96
Add ConstraintParameterParser::mapItemId() helper	mediawiki/extensions/WikibaseQualityConstraints	master	+55 -83
Add ConstraintParameterParser::parseItemIdsParameter() helper	mediawiki/extensions/WikibaseQualityConstraints	master	+52 -39
Introduce ConstraintChecker::getSupportedEntityTypes()	mediawiki/extensions/WikibaseQualityConstraints	master	+237 -21

Customize query in gerrit

Related Objects

Mentioned In: T290142: Small Wikibase Quality Constraints code cleanup and follow-ups
T289262: Constraint checks crash if explicit constraint scope differs from supported context types
Mentioned Here: T290142: Small Wikibase Quality Constraints code cleanup and follow-ups
T244050: Avoid suggesting constraint types incompatible with the Property type
T213803: [Tracking] Request for new constraint types

Event Timeline

Nikki created this task.Dec 9 2020, 12:27 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2020, 12:27 AM

Lucas_Werkmeister_WMDE added a project: Wikibase-Quality-Constraints.Dec 9 2020, 9:11 AM

Lydia_Pintscher added a parent task: T213803: [Tracking] Request for new constraint types.Dec 9 2020, 10:05 AM

I don’t think this belongs under T213803: [Tracking] Request for new constraint types – it’s not really a new constraint type, but rather a new kind of constraint metadata (closest to constraint scope, and in fact we could reuse that property) that could apply to any constraint type.

Ah good point.

Would love to hear people's thoughts on the open questions in the task description.

Should we exclude certain entity types or include? So should we have a allow or deny list?

I would prefer listing the types where the constraint should be checked, which also matches how constraint scope works right now.

Should we reuse the scope parameter?

That would make sense to me since they both cover where to check run the checks, but it might also make it harder to model things in a useful way.

For example: What happens if someone adds multiple values? e.g. "constraint scope: as main value, as reference, on item, on lexeme" - I think the expected interpretation would be (main value || reference) && (item || lexeme) and not a simple "and" or "or" of all the values. Or if you want to say something like "as main value on lexemes or as qualifier on items"?

In T269724#6678791, @Nikki wrote:

Should we reuse the scope parameter?

That would make sense to me since they both cover where to check run the checks, but it might also make it harder to model things in a useful way.

For example: What happens if someone adds multiple values? e.g. "constraint scope: as main value, as reference, on item, on lexeme" - I think the expected interpretation would be (main value || reference) && (item || lexeme) and not a simple "and" or "or" of all the values.

That would be my interpretation as well.

Or if you want to say something like "as main value on lexemes or as qualifier on items"?

If this is really needed, it could still be modeled by making two constraint statements, with the same constraint type and other qualifiers, but different scope. (But I would expect this to be rare.)

I understand the motivation (thanks to the fact that Nikki's tasks are much more interesting and better described than mine), :-) but I believe we should strive to contain the complexity of the constraint system and, if possible, reduce the current complexity, which is quite high. The need that the example represents might not be recurrent and, for that case, I think we could have two Properties, one for Items and one for Forms, to better adjust the constraints (not only the one we're commenting on) and statements of each of them to their cases of use. I wouldn't find it a big problem if there were some similar statements on two different Properties, as they aren't expected to change frequently and they'll be used in different namespaces (so both shouldn't appear together or be read by the same software agents that might know about one Property but not about the other). Please don't hate me for this (hate me for something else…).

In T269724#6684661, @abian wrote:

I understand the motivation (thanks to the fact that Nikki's tasks are much more interesting and better described than mine), :-)

I can't take the credit for the task description, that's Lydia's work. :)

but I believe we should strive to contain the complexity of the constraint system and, if possible, reduce the current complexity, which is quite high. The need that the example represents might not be recurrent and, for that case, I think we could have two Properties, one for Items and one for Forms, to better adjust the constraints (not only the one we're commenting on) and statements of each of them to their cases of use. I wouldn't find it a big problem if there were some similar statements on two different Properties, as they aren't expected to change frequently and they'll be used in different namespaces (so both shouldn't appear together or be read by the same software agents that might know about one Property but not about the other). Please don't hate me for this (hate me for something else…).

I don't think it would be a good idea to split this property. It serves exactly the same purpose in both places - to link to a file containing the pronunciation of a word - and multiple almost identical properties makes it harder for people to use the right one in the right place. It already took me a long time to stop accidentally using the "audio" property instead of "pronunciation audio".

The property constraints themselves don't seem that complex to me. The main problem I have is that the way we model/describe them is quite abstract and technical and I can never remember exactly which properties/values I need to use. Most people should never need to touch property constraints though.

In T269724#6711904, @Nikki wrote:

I can't take the credit for the task description, that's Lydia's work. :)

What a shame, what a disappointment…

I don't think it would be a good idea to split this property. It serves exactly the same purpose in both places - to link to a file containing the pronunciation of a word - and multiple almost identical properties makes it harder for people to use the right one in the right place. It already took me a long time to stop accidentally using the "audio" property instead of "pronunciation audio".

The property constraints themselves don't seem that complex to me. The main problem I have is that the way we model/describe them is quite abstract and technical and I can never remember exactly which properties/values I need to use. Most people should never need to touch property constraints though.

Actually, only 27% of those active Wikidata users who decided to answer the survey on Property constraints (and who knew what Property constraints were) said that the system was "relatively easy" to use. Even considering only these users, by "complexity" I also mean the number of decisions the system requires us to consider.

If the constraint system consisted only of adding or not adding a constraint per Property without qualifiers (only one constraint type available), there would only be one dichotomous decision to make for each Property, users would be aware of the two possible options and could spend time considering both to make the best decision. The number of decisions would be the number of Properties (~8260). This isn't a small number to begin with but can be addressed by all of us.

If we had 25 constraint types without values or qualifiers (that is, the decision remains simply whether or not to add the constraint), there would be 8260*25=206500 decisions to be made. Here efforts are concentrated on a minority of Properties and the constraint types that are remembered, so the error of omission is introduced, it's not known whether certain constraints are missing because they aren't applicable or because they haven't yet been considered, some of the cases considered when there was only one constraint type are no longer considered and each constraint type receives several (up to ~25) times less attention on average. From this supersimplified scenario onwards, each qualifier that is possible to include and each value that needs to be specified causes a new combinatorial explosion in the number of decisions ("complexity"), increases the number of omissions and forces high-impact decisions to receive less attention, as the features that are recalled or well specified and the features that have the greatest impact in each case don't necessarily coincide.

To cover a single case or a small set of them, the number of possibilities this qualifier would introduce, which could be specified for any constraint type, would be disproportionate, including inconsistencies such as specifying a set of entity types that violate those of the allowed entity types constraint, even specifying the entity types for an allowed entity types constraint. Perhaps to solve the problem you indicate with the different Properties ("pronunciation audio for this Item" or "pronunciation audio for this Form", for example) we only need allowed entity types constraints to be taken into account in the web interface (Properties that are not applicable to an entity type shouldn't be suggested for that entity type). Also according to https://www.wikidata.org/wiki/Wikidata:2020_report_on_Property_constraints#allowed_entity_types:

This constraint type is the third with the highest proportion of mandatory constraints (48%), only after the Commons link and Property scope constraint types. Consistently, it has no constraints with the suggestion level and no exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.

To reduce complexity for users (number of decisions they have to make), it would also be nice to address T244050 and remove from our sight those constraint types that don't make sense considering the Property type.

All this talk is just my opinion, but I wanted to explain what I meant by "complexity", because I recognize that it was very ambiguous. Don't stop implementing something just because it doesn't look promising to me if the arguments don't convince you…

Nikki updated the task description. (Show Details)Dec 28 2020, 11:07 AM

Next example: Pinyin transliteration should have a property scope of qualifier on items and lexemes and main property on forms. Since we can't do that, it now says the scope is main property or qualifier everywhere.

And the next one: transliteration has some constraints for what type of items it can be used on. Those constraints apply to usage on items and do not make sense applied to lexemes. Since I can't restrict those constraints to items, the only way I could resolve the constraint violations they were creating was to remove the constraints entirely.

And another: reading pattern of Han character should be qualifier on items and main value on forms, not main value or qualifier everywhere.

Esc3300 added a subscriber: MisterSynergy.Jun 15 2021, 5:39 AM

Esc3300 moved this task from incoming to incoming (new or improved ways to define constraints) on the Wikibase-Quality-Constraints board.Jun 15 2021, 12:54 PM

sponsor had an "item requires statement" constraint which should only apply to items, not media files.

I think it's an error to use Wikidata constraints for Commons as such. Constraints are defined for Wikidata, not any possible other Wikibase using Wikidata.

Lydia_Pintscher added a project: Wikidata-Campsite.Jul 4 2021, 11:12 AM

Lydia_Pintscher moved this task from Incoming to Needs Wikidata PM Work on the Wikidata-Campsite board.

Eihel subscribed.Jul 5 2021, 10:04 AM

Same problem for "allowed entity types constraint (Q52004125)" : "Wikibase MediaInfo (Q59712033)" which is reported on Commons.

Manuel moved this task from Needs Wikidata PM Work to Unconnected Stories on the Wikidata-Campsite board.Jul 13 2021, 8:06 AM

Manuel moved this task from Unconnected Stories to Needs Wikidata PM Work on the Wikidata-Campsite board.Jul 13 2021, 5:57 PM

Manuel updated the task description. (Show Details)Jul 27 2021, 8:36 AM

Should we reuse the scope parameter?

I guess one solution to this would be that we implement this parameter with a separate configuration variable, for a second property ID, but then use the same property ID as for the existing scope parameter (P4680) as the default, so that on Wikidata, the same property is used for both. Then, if we want to have two different properties after all, we’d just have to change the configuration to use a different property ID, without changing the code.

Thank you, Lucas, let's make it so!

Lucas_Werkmeister_WMDE updated the task description. (Show Details)Jul 27 2021, 8:42 AM

Lucas_Werkmeister_WMDE updated the task description. (Show Details)

Manuel updated the task description. (Show Details)Jul 28 2021, 8:53 AM

Manuel moved this task from Needs Wikidata PM Work to Unconnected Stories on the Wikidata-Campsite board.

Addshore set the point value for this task to 8.Aug 4 2021, 10:27 AM

Addshore moved this task from Unconnected Stories to Wikidata-Campsite-Iteration-∞ (On Hold) on the Wikidata-Campsite board.Aug 4 2021, 10:43 AM

Addshore edited projects, added Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)); removed Wikidata-Campsite.

Michael claimed this task.Aug 9 2021, 4:36 PM

Michael moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Restricted Application added a project: User-Michael. · View Herald TranscriptAug 9 2021, 4:36 PM

Michael removed Michael as the assignee of this task.Aug 19 2021, 9:55 AM

Michael moved this task from Doing to To Do (prioritised from top to bottom) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Michael subscribed.

Lucas_Werkmeister_WMDE claimed this task.Aug 19 2021, 10:21 AM

Lucas_Werkmeister_WMDE moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Lucas_Werkmeister_WMDE mentioned this in T289262: Constraint checks crash if explicit constraint scope differs from supported context types.Aug 19 2021, 1:52 PM

Change 713888 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Introduce ConstraintChecker::getSupportedEntityTypes()

https://gerrit.wikimedia.org/r/713888

gerritbot added a project: Patch-For-Review.Aug 19 2021, 3:19 PM

Change 714072 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::parseItemIdsParameter() helper

https://gerrit.wikimedia.org/r/714072

Change 714073 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::mapItemId() helper

https://gerrit.wikimedia.org/r/714073

Change 714074 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Extract ConstraintParameterParser mappings into methods

https://gerrit.wikimedia.org/r/714074

Change 714075 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Add constraint scope for entity types

https://gerrit.wikimedia.org/r/714075

Change 714076 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Avoid parsing the same parameters twice

https://gerrit.wikimedia.org/r/714076

Lucas_Werkmeister_WMDE moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Aug 23 2021, 10:44 AM

Another one to fix once the constraint is available: The property scope constraint for pronunciation variety should be split into two, with "qualifier" for item and form, and "main value" for mediainfo.

Change 715002 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseQualityConstraints@master] Unify terminology around allowed/valid types

https://gerrit.wikimedia.org/r/715002

Another to fix: The property scope constraint for ALA-LC romanisation should be split, with "main value" for forms and "qualifier" for items.

Michael mentioned this in T290142: Small Wikibase Quality Constraints code cleanup and follow-ups.Sep 1 2021, 11:07 AM

Change 713888 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Introduce ConstraintChecker::getSupportedEntityTypes()

https://gerrit.wikimedia.org/r/713888

Change 714072 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::parseItemIdsParameter() helper

https://gerrit.wikimedia.org/r/714072

Change 714073 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add ConstraintParameterParser::mapItemId() helper

https://gerrit.wikimedia.org/r/714073

Change 714074 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Extract ConstraintParameterParser mappings into methods

https://gerrit.wikimedia.org/r/714074

Change 714075 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Add constraint scope for entity types

https://gerrit.wikimedia.org/r/714075

Change 715002 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Unify terminology around allowed/valid types

https://gerrit.wikimedia.org/r/715002

ReleaseTaggerBot added a project: MW-1.37-notes (1.37.0-wmf.23; 2021-09-13).Sep 1 2021, 12:00 PM

The required functionality is now implemented. The remaining open patch will be better attached to T290142.

Addshore reassigned this task from Lucas_Werkmeister_WMDE to Manuel.Sep 21 2021, 2:04 PM

\o/

Change 714076 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/WikibaseQualityConstraints@master] Avoid parsing the same parameters twice

Reason:

https://gerrit.wikimedia.org/r/714076

Maintenance_bot moved this task from incoming to in progress on the Wikidata board.Nov 8 2021, 12:15 PM

Lucas_Werkmeister_WMDE merged a task: T301533: Add "constraint applies to entity type" constraint qualifier.Feb 11 2022, 9:05 AM

Lucas_Werkmeister_WMDE added a subscriber: Lectrician1.

Lectrician1 added a parent task: T213803: [Tracking] Request for new constraint types.Jul 12 2022, 3:48 AM

Lectrician1 removed a parent task: T213803: [Tracking] Request for new constraint types.Jul 12 2022, 3:52 AM

Allow restricting constraints to certain entity typesClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Allow restricting constraints to certain entity types
Closed, ResolvedPublic8 Estimated Story Points
Actions