[Story] Show all available languages in monolingual text value's suggester
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	adrianheine
	Jan 26 2016, 10:05 AM

Description

As an editor I want to enter values for Properties with datatype monolingual text in any available language in order to record complete data.

Problem:
The language suggester for monolingual text does not show some accepted languages in its dropdown despite it being possible to save statements with these values. This is confusing for users.

Example:
You can store statements for monolingual text values with language code cho but it is not shown in the dropdown when entering the language code.

Screenshots/mockups:

Screenshot_20171204_172854.png (252×805 px, 26 KB)

BDD
GIVEN a special language code
WHEN entering a monolingual text value
AND entering the special language code in the language field
THEN it is recognized
AND shows up in the suggester

Acceptance criteria:

all accepted language codes show up in the dropdown for monolingual text values
at least the language code is displayed, if possible the autonym (the language name in the language itself) as well or ideally the translated language name + code

Details

Subject	Repo	Branch	Lines +/-
Remove outdated comment in getDefaultMonolingualTextLanguages()	mediawiki/extensions/Wikibase	master	+0 -5
Move dynamic source file callback out of resource.php	mediawiki/extensions/WikibaseLexeme	master	+9 -9
InvalidLanguageIndicator: inject valid languages	mediawiki/extensions/WikibaseLexeme	master	+39 -138
Show all available languages in Gloss lang suggester	mediawiki/extensions/WikibaseLexeme	master	+25 -17
Show all available languages in monolingual text lang suggester	mediawiki/extensions/Wikibase	master	+62 -36
LanguageSelector: make language names optional	data-values/value-view	master	+14 -8
LanguageSelector.tests: refactor for readability	data-values/value-view	master	+29 -23
[WIP] Expose additional monolingual languages to LanguageSelector	mediawiki/extensions/Wikibase	master	+20 -5

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	guergana.tzatchkova	T259340 Labels in languages from $wmgExtraLanguageNames cannot be used on client wikis
Resolved	guergana.tzatchkova	T260118 Move content of $wgExtraLanguageNames on Wikidata to default Terms languages
Stalled	Lucas_Werkmeister_WMDE	T263441 Clean up $wgExtraLanguageNames production config
Open	None	T273627 Remove wmgExtraLanguageNames from Wikimedia production
Open	None	T124286 [Epic] Wikidata language support
Resolved	Manuel	T275781 Show the language name in monolingual text value's suggester
Resolved	Jakob_WMDE	T124758 [Story] Show all available languages in monolingual text value's suggester
Duplicate	None	T147839 "mul" is not listed in the suggestions for monolingual text

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I can't think of any place where we would show only language codes. Usually it is one of these:

autonyms only
- E.g. Universal Language Selector, Interlanguge list, translatable pages
language code + translated language name with fallback to autonym
- E.g. action=info, Special:PageLanguage
language code + autonym
- E.g. Advanced search beta feature, Special:Preferences

Also, falling back to English is not foolproof, as it might also be not available until added.

Zache subscribed.Jun 6 2018, 5:30 PM

• Pablo-WMDE mentioned this in T194771: Add "mis" language code to the list of language code options on Special:NewLexeme.Jul 5 2018, 8:30 AM

Not sure if anyone is still tracking this, but ran into this today, and doesn't seems to work at all with any Native American languages.

hoo mentioned this in T198202: Add API to check language code validity / get all valid language codes.Aug 8 2018, 10:05 AM

Michael subscribed.Nov 9 2018, 10:58 AM

Tarrow subscribed.Dec 4 2018, 9:12 AM

Mvolz subscribed.Feb 27 2019, 2:03 PM

@Amire80 @Sascha Do any of you have experience with adding languages/language names in CLDR? Is that a complex or long process?

I reported a few CLDR issues, and some of them were resolved, but I can't say I'm exceptionally good at getting them to resolve my issues or at adding new languages. I think that @Nemo_bis may be more experienced in this particular area, however.

thiemowmde unsubscribed.Mar 13 2019, 3:27 PM

WMDE-leszek subscribed.Mar 13 2019, 3:36 PM

In T124758#5021110, @Lea_Lacroix_WMDE wrote:

@Amire80 @Sascha Do any of you have experience with adding languages/language names in CLDR? Is that a complex or long process?

I am not either of those people, but my comment at T151269#2822033 seems relevant here. CLDR have already rejected some of our requests because they don't want to add lots of language names. There's a suggestion at T168799 to create our own extension instead.

The easiest way to add a new language to CLDR is preparing ‘seed’ files in XML format;

See https://www.unicode.org/repos/cldr/trunk/seed/main/arn.xml for a minimal example.
See http://cldr.unicode.org/index/cldr-spec/minimaldata for a description what minimal content is expected for a new language.
See https://www.unicode.org/repos/cldr/trunk/seed/main/ for the current files in seed stage.

When reading data from CLDR, consider injecting the English names from the IANA language subtag registry as a fallback when a language is missing from CLDR. That would immediately give at least an English name to every language in existence (provided it has an ISO/IETF language code). Another good data source for enriching CLDR might be Wikidata, via property P305 (IETF language tag).

Disclaimer: I volunteer at Unicode CLDR and am the maintainer for some minor parts of its codebase. So in my personal experience, the process has been super smooth.. :-)

@Nikki, can you send me your CLDR tickets that got rejected? I’d like to understand the reason, it sounds surprising.

Thank you all for your feedback!
@Sascha Your experience with CLDR could definitely be useful for Wikidata, since we're struggling with displaying names of languages that are not entered yet in CLDR.

For example, Numidian (nxm) has been added as an available language for monolingual text in Wikidata, but when I try to use it it's not appearing in the suggestion list, causing confusion for users who may think that the language is not available.

(quick way to test it: go to the sandbox, add a new statement with the property title, then enter a test value and finally type "nxm" or "num" in the language field that appears: Numidian is not suggested. However, if you type nxm and save the statement, it's correctly saved and "Numidian" is displayed)

The code "nxm" seems to be unavailable in CLDR https://www.unicode.org/repos/cldr/trunk/seed/main/nxm.xml

I'd like to make an experiment with this example: try adding this language to CLDR, and see if this action solves our problem on Wikidata. Would anyone be willing to try submitting data about Numidian to CLDR? :)

Sure, but it will take a while until the next official release of CLDR so you'd have to read the CLDR data from the development branch ("trunk"). I do wonder, though, if you could read the IANA registry in addition to CLDR and use IANA as fallback for the English names when CLDR has no data yet. Then, you would immediately get an English name for every language with an ISO 639 or IETF BCP 47 code, so you'd add support for a couple thousand languages at once.

Can you point me to the source repository where you are currently reading CLDR?

In T124758#5023285, @Sascha wrote:

@Nikki, can you send me your CLDR tickets that got rejected? I’d like to understand the reason, it sounds surprising.

The ones I'm aware of are the ones I mentioned in T151269#2822033 and the comment directly after it.

Oh, all you need from CLDR is an English label? Nothing else? In that case, this Wikidata query might be helpful:

SELECT ?code ?itemLabel
WHERE  {
  ?item wdt:P305 ?code
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

It shouldn't be difficult to write a script that fetches the current CLDR data file and patches it with labels from Wikidata. (Also in other languages than English). Even easier might be to change the source code of the tool that you're currently using to read CLDR; is that tool publicly available?

NoInkling subscribed.Mar 21 2019, 11:42 AM

Nikki mentioned this in T217430: Add support for all Saami languages to Wikidata.Mar 25 2019, 2:43 PM

In T124758#4126411, thiemowmde wrote:

A temporary workaround could be to add the additional language codes to the list of suggested languages. These entries would only show the code, but no language name. That's obviously far from perfect, but much better than nothing. Have a look at the JavaScript class wikibase.WikibaseContentLanguages. It currently simply returns the UniversalLanguageSelector's language list, but excludes a few that are also excluded in the backend (see WikibaseRepo::getMonolingualTextLanguages). This means there is already some duplication going on in the backend and frontend! This duplication could either be expanded, or resolved by introducing a MediaWiki-ResourceLoader module that returns the list of languages allowed in monolingual values.

A list of monolingual language codes is now available, though not as a ResourceLoader module, but via the action API, as meta=wbcontentlanguages. And as far as I can tell, we actually have English names for all those languages:

$ curl -G -s \
    -d action=query \
    -d meta=wbcontentlanguages \
    -d wbclcontext=monolingualtext \
    -d wbclprop='code|name' \
    -d format=json \
    -d formatversion=2 \
    https://www.wikidata.org/w/api.php | \
  jq -c '.query.wbcontentlanguages | .[] | select(.name == null)' | \
  wc -l
0

It looks like @Raymond periodically adds them to our CLDR MediaWiki extension, as an addition to the upstream CLDR data (example change). I don’t know why they’re not displayed on the monolingual statements themselves (example statement, currently shows “sjn” instead of “Sindarin”), but we seem to have them in some form or other. (I guess this also answers @Sascha’s last question?)

In T124758#5177460, @Lucas_Werkmeister_WMDE wrote:

It looks like @Raymond periodically adds them to our CLDR MediaWiki extension, as an addition to the upstream CLDR data (example change). I don’t know why they’re not displayed on the monolingual statements themselves (example statement, currently shows “sjn” instead of “Sindarin”), but we seem to have them in some form or other. (I guess this also answers @Sascha’s last question?)

Yes, I monitor addition of new languages and add them to CLDR as soon as possible. In your example I see the word "Sindarin". But not while typing the language name or language code into the input field. I am not sure of this is a regression/new bug.

Btw: My (more or less) complete test item is https://test.wikidata.org/wiki/Q149653

In your example I see the word "Sindarin".

Ah – after I purged the English page, I see “Sindarin” as well, so that’s actually working, it was just cached from before your addition. I assume you were looking at the page in German, and the German version wasn’t cached yet.

But not while typing the language name or language code into the input field.

Yes, that’s what this bug is about :) we currently don’t have those extra language codes and names client-side.

Zache mentioned this in T223524: WMHack19: Add Saami + Romani languages to Wikidata.May 17 2019, 1:30 PM

Bugreporter mentioned this in T233653: Add monolingual language code ota.Sep 24 2019, 7:04 PM

Bugreporter merged a task: T240386: Canadian French doesn't show up in the list when adding a value for a monolingual property.Dec 11 2019, 6:47 AM

Bugreporter added subscribers: Mbch331, VIGNERON, Bouzinac.

Lea_Lacroix_WMDE added a project: UX-Debt.Dec 12 2019, 9:34 AM

Change 425785 abandoned by Thiemo Kreuz (WMDE):
[mediawiki/extensions/Wikibase@master] [WIP] Expose additional monolingual languages to LanguageSelector

Reason:

https://gerrit.wikimedia.org/r/425785

Lucas_Werkmeister_WMDE mentioned this in T264295: Reinstate $wgExtraLanguageCodes in production.Oct 1 2020, 10:17 AM

Lucas_Werkmeister_WMDE mentioned this in T264296: [M] WikibaseMediaInfo does not support editing monolingual text in languages not supported by MediaWiki.Oct 1 2020, 10:30 AM

There are even a bunch of languages we can add labels for which don't show up in the list, despite not being explicitly excluded, e.g. aa, cho, dag, es-419, ho, hz, ng, rn, shi-latn, uz-cyrl, uz-latn...

matej_suchanek removed a project: Patch-For-Review.Jan 16 2021, 9:12 AM

matej_suchanek removed subscribers: • iecetcwcpggwqpgciazwvzpfjpwomjxn, • Jonas.

Lucas_Werkmeister_WMDE mentioned this in T273627: Remove wmgExtraLanguageNames from Wikimedia production.Feb 2 2021, 3:06 PM

Lucas_Werkmeister_WMDE added a parent task: T273627: Remove wmgExtraLanguageNames from Wikimedia production.

Lydia_Pintscher added a project: Wikidata-Campsite.Feb 5 2021, 10:06 AM

Lydia_Pintscher updated the task description. (Show Details)

Lydia_Pintscher moved this task from Incoming to Unconnected Stories on the Wikidata-Campsite board.

• amy_rc subscribed.Feb 5 2021, 10:31 AM

noarave updated the task description. (Show Details)Feb 10 2021, 12:54 PM

darthmon_wmde updated the task description. (Show Details)Feb 10 2021, 12:54 PM

darthmon_wmde set the point value for this task to 8.Feb 10 2021, 12:57 PM

darthmon_wmde moved this task from Unconnected Stories to Wikidata-Campsite-Iteration-∞ (On Hold) on the Wikidata-Campsite board.

darthmon_wmde edited projects, added Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)); removed Wikidata-Campsite.

Task Inspection note:
The list of languages without a translated language name is here https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/711874256d9d00d5cd3ed1e2d3e82391aaac735c/lib/includes/WikibaseContentLanguages.php#84

In order to send the language codes to JS we could maybe use the resource loader packageFiles mechanism: https://www.mediawiki.org/wiki/ResourceLoader/Package_files#Generated_content

Since the language dropdown for lexeme senses uses the same list, that has the same problem. The codes I mentioned in #6753045 don't show up, nor do any lexeme-specific languages like ctg, fro, nrf-je, az-cyrl.

Screenshot from @Masssly:

Dropdown list in Gloss language codes for Senses does not show Dagbanli, but accepts the "dag" code anyway. It works is not a problem when adding Forms. (2×2 px, 214 KB)

Nikki mentioned this in T272242: Language code "dag" for Dagbani does not work for lexemes.Feb 16 2021, 9:28 AM

Jakob_WMDE claimed this task.Feb 18 2021, 9:00 AM

Jakob_WMDE moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Change 665145 had a related patch set uploaded (by Jakob; owner: Jakob):
[data-values/value-view@master] LanguageSelector.tests: refactor for readability

https://gerrit.wikimedia.org/r/665145

Change 665146 had a related patch set uploaded (by Jakob; owner: Jakob):
[data-values/value-view@master] LanguageSelector: make language names optional, but not languages

https://gerrit.wikimedia.org/r/665146

Change 665309 had a related patch set uploaded (by Jakob; owner: Jakob):
[mediawiki/extensions/Wikibase@master] Show all available languages in monolingual text lang suggester

https://gerrit.wikimedia.org/r/665309

Change 665315 had a related patch set uploaded (by Jakob; owner: Jakob):
[mediawiki/extensions/WikibaseLexeme@master] Show all available languages in Gloss lang suggester

https://gerrit.wikimedia.org/r/665315

Jakob_WMDE moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 19 2021, 1:09 PM

Change 665145 merged by jenkins-bot:
[data-values/value-view@master] LanguageSelector.tests: refactor for readability

https://gerrit.wikimedia.org/r/665145

Change 665146 merged by jenkins-bot:
[data-values/value-view@master] LanguageSelector: make language names optional

https://gerrit.wikimedia.org/r/665146

noarave moved this task from Peer Review to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 22 2021, 10:17 AM

Change 666000 had a related patch set uploaded (by Jakob; owner: Jakob):
[mediawiki/extensions/WikibaseLexeme@master] InvalidLanguageIndicator: inject valid languages

https://gerrit.wikimedia.org/r/666000

Change 665315 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Show all available languages in Gloss lang suggester

https://gerrit.wikimedia.org/r/665315

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.32; 2021-02-23).Feb 22 2021, 11:00 AM

Change 665309 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Show all available languages in monolingual text lang suggester

https://gerrit.wikimedia.org/r/665309

Change 666000 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] InvalidLanguageIndicator: inject valid languages

https://gerrit.wikimedia.org/r/666000

Tobi_WMDE_SW unsubscribed.Feb 22 2021, 1:12 PM

Change 666132 had a related patch set uploaded (by Jakob; owner: Jakob):
[mediawiki/extensions/WikibaseLexeme@master] Move dynamic source file callback out of resource.php

https://gerrit.wikimedia.org/r/666132

Jakob_WMDE moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 22 2021, 2:37 PM

Change 666132 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Move dynamic source file callback out of resource.php

https://gerrit.wikimedia.org/r/666132

Jakob_WMDE moved this task from Peer Review to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 22 2021, 3:27 PM

Amy and I tested it on test. Our observation:

It works fine for dag and a few other codes we tested \o/
There are some issues with other codes that we found when testing with ctg. It is not showing up in the selector. The publish link however turns blue, indicating that it'd be accepted. When clicking publish it is then rejected. See the screenshots below.

Lydia_Pintscher moved this task from Test (Verification) to To Do (prioritised from top to bottom) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 25 2021, 8:42 AM

@Lydia_Pintscher Could it be that ctg isn't a monolingual text language? I found it in the list of additional lexeme term languages but not in the list of monolingual text languages.

Update: Yes, according to T271589 ctg was only added as a lexeme term language, so this works as designed. On a side note, one of the patches here also made all available lexeme term languages pop up in their respective language selectors, where you'll now also find ctg:

Screenshot from 2021-02-25 09-53-12.png (170×254 px, 13 KB)

In T124758#6859831, @Jakob_WMDE wrote:

@Lydia_Pintscher Could it be that ctg isn't a monolingual text language? I found it in the list of additional lexeme term languages but not in the list of monolingual text languages.

Update: Yes, according to T271589 ctg was only added as a lexeme term language, so this works as designed. On a side note, one of the patches here also made all available lexeme term languages pop up in their respective language selectors, where you'll now also find ctg:

Yeah I think it's fine and expected that it isn't accepted. However then the publish link should not turn from gray to blue, right?

In T124758#6859871, @Lydia_Pintscher wrote:

In T124758#6859831, @Jakob_WMDE wrote:

@Lydia_Pintscher Could it be that ctg isn't a monolingual text language? I found it in the list of additional lexeme term languages but not in the list of monolingual text languages.

Update: Yes, according to T271589 ctg was only added as a lexeme term language, so this works as designed. On a side note, one of the patches here also made all available lexeme term languages pop up in their respective language selectors, where you'll now also find ctg:

Yeah I think it's fine and expected that it isn't accepted. However then the publish link should not turn from gray to blue, right?

The publish link always turns blue for any input on the language selector. This was done previously to allow languages that aren't part of the dropdown to be entered, so this language selector works exactly the same way as it did before, just with a more complete list of languages. I agree that it makes a lot less sense now that all allowed languages are actually in the dropdown.

Aha! :D ok makes sense. Thanks!
Then let's close this 🎆

daniel unsubscribed.Feb 25 2021, 9:26 AM

If this ticket is going to be closed, which ticket covers showing the language names in the dropdown?

And why are the language names missing for the ones I listed in #6753045 anyway? Those are all ones which are available for labels (see the language selector on https://test.wikidata.org/wiki/Special:NewItem) and I don't know why they weren't showing up in the list in the first place.

@amy_rc ^ Can you create a new ticket for Nikki's comment?

I have created the ticket for that in T275781

• amy_rc added a parent task: T275781: Show the language name in monolingual text value's suggester.Feb 25 2021, 3:07 PM

Change 668713 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Remove outdated comment in getDefaultMonolingualTextLanguages()

https://gerrit.wikimedia.org/r/668713

Change 668713 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Remove outdated comment in getDefaultMonolingualTextLanguages()

https://gerrit.wikimedia.org/r/668713

ReleaseTaggerBot edited projects, added MW-1.36-notes (1.36.0-wmf.34; 2021-03-09); removed MW-1.36-notes (1.36.0-wmf.32; 2021-02-23).Mar 5 2021, 5:00 PM

	F34121977: Screenshot from 2021-02-25 09-53-12.png
	Feb 25 2021, 8:55 AM

	F34105838: Dropdown list in Gloss language codes for Senses does not show Dagbanli, but accepts the "dag" code anyway. It works is not a problem when adding Forms.
	Feb 16 2021, 9:26 AM

	F11177787: Screenshot_20171204_172808.png
	Dec 4 2017, 4:31 PM

	F11177785: Screenshot_20171204_172854.png
	Dec 4 2017, 4:31 PM

[Story] Show all available languages in monolingual text value's suggesterClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

[Story] Show all available languages in monolingual text value's suggester
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...