Page MenuHomePhabricator

New constraint type to ensure that Items have a Label in a specific language
Closed, ResolvedPublic5 Estimated Story Points

Description

Context:
Constraints System

User story:
As an editor, I want to define that all Items using a certain Property should have a Label in a specific language. This helps to ensure that all the mandatory Labels are added.

Problem:
We currently don't have a constraint that helps ensure certain Items always have a Label in a specific language. Currently, editors are solving this with a complex constraint: Help:Property_constraints_portal/Label_language

Example:

  • RKDartists ID (P650) "Identifier for artists in the database of the Netherlands Institute of Art History". Items using this Property should have a Dutch label.

BDD
GIVEN a Constraint definition that requires a Label in language X for Property Y
WHEN an Item uses Property Y
AND does not have a Label in Language X
THEN a constraint violation is triggered for the statement using Property Y

Notes:

  • We also want this constraint to work on Property pages. We probably want to ignore it on MediaInfo. We can't apply it on Lexemes, Forms, and Senses because they don't have Labels.
  • If several languages are provided in the constraint then the constraint is satisfied if at least one of the languages has a label added.
  • We are using "Wikimedia Language Code (P424)" as the qualifier.
  • We are using a new item (“label in language constraint”) as the constraint type, not reusing the complex constraint label language item.

Acceptance criteria:

  • constraint violations are triggered when an Item uses a Property with a Label language constraint but does not have a label in the specified language (see BDD including notes)
  • the new constraint type is documented at Help:Property constraints portal (new subpage)

Original report:

Label in language: An item using a certain property should at least have a label in this language or these languages. example defined on Property talk:P650 (https://www.wikidata.org/wiki/Property_talk:P650).

(part of https://www.wikidata.org/wiki/Help_talk:Property_constraints_portal#Improvements_for_2018 )

Currently implemented as "complex constraint", see Help:Property_constraints_portal/Label_language

Event Timeline

CommunityTechBot renamed this task from plcaaaaaaa to Label in language constraint type.Jul 1 2018, 4:54 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.

My devil's advocate questions:

  • I assume that the main purpose of all constraint types is to inform editors of mistakes or suggestions that would otherwise not be obvious. I also assume that all Items should have both labels and descriptions in virtually all supported languages. Why should users be told that the lack of a label in one language is a mistake and shouldn't be told the same about a label in another language that is more relevant (e.g. more demanded by Wikipedia) or more familiar to the user?
  • Would it be expected that users who didn't previously fill in labels would switch to filling them in as a result of these violations? Or would these violations be ignored and contribute to all constraint types being ignored more?
  • Might labels that were not suggested by these constraints be more neglected?
  • I assume that the main purpose of all constraint types is to inform editors of mistakes or suggestions that would otherwise not be obvious. I also assume that all Items should have both labels and descriptions in virtually all supported languages.

I think both assumptions are just yours.

  1. Why did you make them?
  2. Is there anything in Help:Property_constraints_portal/Label_language that lead you to assume this?
  3. Isn't the contributor already shown that labels in languages relevant them are missing?

My devil's advocate questions:
<knip>

Not sure about what your intentions are with these questions. I'll just assume good faith thus ignorance. We have a lot of art related properties we use for sourcing and linking, for example https://www.wikidata.org/wiki/Property:P5499 .
The source provides a valid label in a language (in this case Dutch) so it should always be set. If it's not set it usually means the item needs attention, exactly what we have constraint violations for. In the example we have 6 items out of 8700 that need attention: https://w.wiki/3Qbp . Just like with every other constraint, it should be used with care, but that's a meta discussion that should happen here (see https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette ).

Not sure about what your intentions are with these questions. I'll just assume good faith

Hey. Of course there's no bad intention; I don't know why I should have bad intentions towards you, you've never harmed me or my family (because you haven't... have you? 😅).

My devil's advocate questions:

  • I assume that the main purpose of all constraint types is to inform editors of mistakes or suggestions that would otherwise not be obvious. I also assume that all Items should have both labels and descriptions in virtually all supported languages. Why should users be told that the lack of a label in one language is a mistake and shouldn't be told the same about a label in another language that is more relevant (e.g. more demanded by Wikipedia) or more familiar to the user?
  • Would it be expected that users who didn't previously fill in labels would switch to filling them in as a result of these violations? Or would these violations be ignored and contribute to all constraint types being ignored more?
  • Might labels that were not suggested by these constraints be more neglected?

I usually leave these kinds of questions on Phabricator to try to assess in advance possible problems I can think of that might (or might not) arise after the possible implementation. The questions aren't ironic, I really don't know the answers but I do think they're relevant. There are a couple of dozen constraint types considered or proposed for implementation, so the first question would serve to justify why this constraint type would be particularly helpful; the second question is related to the problem of habituation to warnings, which is already occurring on Wikidata and we know both indirect figures and individual users who routinely ignore these warnings, but this problem will evolve and new warnings could make it worse or better; and the third would be another question to think about whether the overall balance of obeying the warnings would be positive for Wikidata (re)users or not. Of course I don't have these answers nor a special interest that this constraint type isn't implemented... until you harm me or my family, in which case I won't hesitate to add a dislike token to this very task.

Concerns about someone's family and the constraint system in general are hardly relevant to this.

We need some input: How would the constraint statement look in your opinion? Should we use "Wikimedia Language Code (P424)" as the qualifier? A new Property?

I would like to have "If several languages are provided in the constraint then the constraint is satisfied if at least one of the languages has a label added" changed to "If several languages are provided in the constraint then the constraint is satisfied if all of the listed languages have a label added" (so not OR, but AND)

We need some input: How would the constraint statement look in your opinion? Should we use "Wikimedia Language Code (P424)" as the qualifier? A new Property?

On https://www.wikidata.org/wiki/Property:P650 we use " language of work or name (P407) " set to a language " Dutch (Q7411)". The language has the " Wikimedia language code (P424) " set to "nl". I would use the same property P407 for this constraint.

I would like to have "If several languages are provided in the constraint then the constraint is satisfied if at least one of the languages has a label added" changed to "If several languages are provided in the constraint then the constraint is satisfied if all of the listed languages have a label added" (so not OR, but AND)

You can always achieve that effect by having several independent constraints. On the other hand, if we go for AND, there’s no way to emulate OR. So going for OR is more flexible.

We need some input: How would the constraint statement look in your opinion? Should we use "Wikimedia Language Code (P424)" as the qualifier? A new Property?

On https://www.wikidata.org/wiki/Property:P650 we use " language of work or name (P407) " set to a language " Dutch (Q7411)". The language has the " Wikimedia language code (P424) " set to "nl". I would use the same property P407 for this constraint.

Hmmm that would make it quite a bit more complex and introduce the corner cases where the Item for the language doesn't have a statement for Wikimedia language code or when there are several. "language of work or name" also doesn't seem like a great Property for this to me tbh.

More thoughts?

Manuel renamed this task from Label in language constraint type to New constraint type to ensure that Items have a Label in a specific language.Aug 3 2021, 1:20 PM
Manuel updated the task description. (Show Details)
Addshore set the point value for this task to 5.Aug 11 2021, 10:38 AM

This will have one slightly unusual consequence. Currently, we re-check constraints after a statement is saved, to update the indicators that are shown, in case a statement edit resolved another constraint violation (e.g. the statement of “item requires statement” was added). This new constraint type will, I believe, be the first one that may be resolved by non-statement edits to the same entity. Should we add some code to also re-check constraints after a label edit?

Change 713671 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/WikibaseQualityConstraints@master] [WIP] Introduce LabelInLanguageChecker

https://gerrit.wikimedia.org/r/713671

This new constraint type will, I believe, be the first one that may be resolved by non-statement edits to the same entity.

Not true, T200689: Add “language required by this lexeme” constraint type already exists.

Change 713671 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Introduce LabelInLanguageChecker

https://gerrit.wikimedia.org/r/713671

It's not deployed yet. It'll be deployed next week (no train this week)

It's not deployed yet. It'll be deployed next week (no train this week)

Did we go for the AND or OR implementation?

OR, as stated in the notes:

  • If several languages are provided in the constraint then the constraint is satisfied if at least one of the languages has a label added.

The documentation still seems to be missing here.