Maniphest T320889

Use the same list of languages for monolingual text and lexemes
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Nikki
	Oct 16 2022, 7:48 AM

Description

As an editor, I want to use the same language codes when moving between "monolingual text statements" and "lexemes" in order to have a move predictable/smooth workflow.

As a data reuser, I want the same language codes to be used on "monolingual test statements" and "lexemes" in order to have a more consistent representation of data.

Problem:
Currently, "monolingual text statements" and "lexemes" use different lists of languages, resulting in different language codes being used for the same language.

The different language codes can result in an inconsistent representation of data, and make it difficult for users working between the "monolingual text statements" and "lexemes".

This can also cause confusion and frustration for the editors when they can enter data for one but not the other because the same `language codez is not supported in both places.

Example:
As all 'monolingual text statements' could also be lexemes, the same language codes should be used for both "monolingual text statements" and "lexemes".
Merging the lists for "monolingual text statements" and "lexemes" so that they use the same language code could make for a better user experience for both editors and reusers.

Acceptance criteria:

The lists for "monolingual text statements" and "lexemes" are merged so that they use the same language codes

Notes
List of Lists of Languages

Orginal ticket

Currently, monolingual text statements and lexemes have separate lists of additional languages.

There are multiple monolingual text properties designed for use on lexemes. Therefore all lexeme language codes should be usable for monolingual text statements.

By definition, monolingual text statements include text. If we can represent something as a monolingual text statement, it contains content which could have lexemes. Therefore all monolingual text language codes should be usable for lexemes.

Advantages of combining the two lists:

More consistent data representation - right now we have to use one language code in some situations and another in others.
More predictable for users - users don't expect language codes to sometimes work and sometimes not work.
Easier to maintain - there would be fewer lists of languages to update.

Potential issues:

What about special language codes which aren't for a particular language? Monolingual text allows und, mis, mul and zxx. We already have mis for lexemes but what about the others?

This would be one way to solve T320887

Details

	Subject	Repo	Branch	Lines +/-
	Support all monolingual text languages for Lexemes	mediawiki/extensions/WikibaseLexeme	master	+81 -152

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	guergana.tzatchkova	T259340 Labels in languages from $wmgExtraLanguageNames cannot be used on client wikis
Resolved	guergana.tzatchkova	T260118 Move content of $wgExtraLanguageNames on Wikidata to default Terms languages
Stalled	Lucas_Werkmeister_WMDE	T263441 Clean up $wgExtraLanguageNames production config
Open	None	T273627 Remove wmgExtraLanguageNames from Wikimedia production
Open	None	T320889 Use the same list of languages for monolingual text and lexemes
Open	None	T320887 Language codes that are explicitly not allowed for monolingual text should also not be allowed for lexemes

Event Timeline

Nikki created this task.Oct 16 2022, 7:48 AM

I think there's an argument for allowing und, particularly for etymology. Sometimes a word is given but it's not clear which language is actually intended.

I don't think mul or zxx make sense on lexemes. Anything that isn't specific to a language is more conceptual and that sort of stuff belongs on items. However, the advantages of merging the two lists are big enough that I think it should be done even if means allowing those two for lexemes.

Nikki updated the task description. (Show Details)Oct 16 2022, 7:57 AM

+1, I strongly support this – having looked at most of the new language codes that have been added through the years, I have yet to come across a language that makes sense for one but not the other (with the possible exception of the special ones Nikki mentions, but that should be easy enough to solve – just make WikibaseLexeme.mediawiki-services.php's $additionalLanguages equal WikibaseContentLanguages.php's getDefaultMonolingualTextLanguages() and then unset the special ones).

I think all additional language lists should be killed in favor of language-data.

mrephabricator awarded a token.Oct 16 2022, 4:32 PM

In T320889#8319167, @Nikki wrote:

I think there's an argument for allowing und, particularly for etymology. Sometimes a word is given but it's not clear which language is actually intended.

I don't think mul or zxx make sense on lexemes. Anything that isn't specific to a language is more conceptual and that sort of stuff belongs on items. However, the advantages of merging the two lists are big enough that I think it should be done even if means allowing those two for lexemes.

For mul, see also https://en.wiktionary.org/wiki/Wiktionary:About_Translingual. Note that we have items for each Unicode characters but the following is useful:

Abbreviations and codes, especially those with multiple meanings
Symbols and punctuation with multiple meanings

Lectrician1 subscribed.Oct 17 2022, 12:12 AM

Mahir256 mentioned this in T319125: Add monolingual and lexeme language codes xbm (Middle Breton), obt (Old Breton).Oct 27 2022, 12:47 PM

Manuel added a project: Wikidata-Campsite.Nov 17 2022, 3:36 PM

Manuel moved this task from Incoming to Needs Wikidata PM Work on the Wikidata-Campsite board.

Lydia_Pintscher added a project: Wikidata Dev Team.Dec 31 2022, 1:38 PM

Lydia_Pintscher moved this task from Incoming to Product Backlog on the Wikidata Dev Team board.

Arian_Bozorg updated the task description. (Show Details)Feb 10 2023, 2:40 PM

Arian_Bozorg updated the task description. (Show Details)Feb 10 2023, 3:25 PM

mxn subscribed.Feb 10 2023, 4:13 PM

mrephabricator subscribed.Feb 10 2023, 6:28 PM

Nikki mentioned this in T148887: Add monolingual language code nn-hognorsk for Høgnorsk.Mar 29 2023, 1:57 PM

Winston_Sung moved this task from Backlog to Wikidata (lexemes + monolingual text) on the Language codes board.Apr 19 2023, 5:21 PM

Nikki mentioned this in T341409: [TECH] Use LanguageNameUtils::ALL for monolingual text and lexemes.Jul 8 2023, 12:52 PM

Frostly moved this task from Needs Wikidata PM Work to Remove on the Wikidata-Campsite board.Oct 14 2023, 9:21 PM

Frostly removed a project: Wikidata-Campsite.

Aklapper added a project: Wikidata-Campsite.Oct 16 2023, 8:57 AM

Aklapper moved this task from Remove to Needs Wikidata PM Work on the Wikidata-Campsite board.

Lectrician1 unsubscribed.Oct 16 2023, 2:20 PM

hoo merged a task: T273625: Make lexeme language codes inherit from Wikibase default terms languages, not MediaWiki content languages.Nov 8 2023, 9:28 PM

hoo mentioned this in T273627: Remove wmgExtraLanguageNames from Wikimedia production.

hoo added a parent task: T273627: Remove wmgExtraLanguageNames from Wikimedia production.

hoo added subscribers: Lucas_Werkmeister_WMDE, Manuel, Mohammed_Sadat_WMDE.

Change 974656 had a related patch set uploaded (by Hoo man; author: Hoo man):

[mediawiki/extensions/WikibaseLexeme@master] Support all monolingual text languages for Lexemes

https://gerrit.wikimedia.org/r/974656

gerritbot added a project: Patch-For-Review.Nov 15 2023, 6:00 PM

Change 974656 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Support all monolingual text languages for Lexemes

https://gerrit.wikimedia.org/r/974656

ReleaseTaggerBot added a project: MW-1.42-notes (1.42.0-wmf.9; 2023-12-12).Dec 11 2023, 4:01 PM

Maintenance_bot removed a project: Patch-For-Review.Dec 11 2023, 4:11 PM

As part of T341409 this has been (mostly) done. WikibaseLexeme, for backwards compatibility, still supports the following language codes which we don't support for monolingual text values:

bat-smg
be-x-old
de-formal
es-formal
fiu-vro
hu-formal
nl-informal
roa-rup
simple
zh-classical
zh-min-nan
zh-yue

Bugreporter added a subtask: T320887: Language codes that are explicitly not allowed for monolingual text should also not be allowed for lexemes.Fri, May 10, 2:59 AM

Use the same list of languages for monolingual text and lexemesOpen, Needs TriagePublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Use the same list of languages for monolingual text and lexemes
Open, Needs TriagePublic
Actions

Related Objects
Search...