**User story:**
As an editor, I want to avoid repeating identical labels in hundreds of languages if it is the same across languages. This will reduce redundancy and the amount of content that needs to be maintained.
**Problem:**
We have many labels that are by principle identical across different languages (see examples section). This has bad consequences:
* editors having to create and maintain redundant content (copying the same thing to most/all languages ccould creates massive amounts of edits and is a huge waste of resources)
* need of storing redundant information that burdens our systems (e.g. the Query Service)
* user tend to fill in empty label fields, especially when a description in the language is present
* empty label fields may result in suboptimal string additions
* fall-back is generally ill understood
**Examples:**
* Names
** persons (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q5, as of now 9.2M) have in most cases the same label and the same aliases repeated in different languages, e.g. https://www.wikidata.org/wiki/Q42 . Labels generally differ by script (Latin script and all others)
** given names and family names (https://w.wiki/3zWT, which counts Q202444 and Q101352 including subclasses, as of now 590k): in all cases the same label are repeated in different same-script languages, e.g. https://www.wikidata.org/wiki/Q21448867 . This to avoid that translations are added (e.g. "John"@en and "Giovanni"@it shouldn't be on the same item).
** astronomical objects (11M), the [[https://www.wikidata.org/wiki/Q74758893|galaxy "SDSS J151017.36+160605.3"]] - has "SDSS J151017.36+160605.3" as the label 411 times,
** taxa (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q16521, as of now 3.1M) the [[https://www.wikidata.org/wiki/Q39898268|species "Neotrogla curvata"]] - has "Neotrogla curvata" as the label 411 times. Latinized names should be generally available as fallback.
* Unicode characters
** [[https://www.wikidata.org/wiki/Q87526860|Unicode character "♣"]] - has "♣" as the label and "U+2663" as an alias 446 times
* Codes and abbreviations
** [[https://www.wikidata.org/wiki/Q191118|metric ton]] - should have "t" as alias in Latin script languages, "т" as alias for Cyrillic languages
** [[https://www.wikidata.org/wiki/Q39|Switzerland]] - has "CH" as an alias 403 times
** [[https://www.wikidata.org/wiki/Q623|carbon]] - has "C" as an alias 187 times
** the [[https://www.wikidata.org/wiki/Q28006|disambiguation page "C"]] - has "C" as the label 104 times
** the [[https://www.wikidata.org/wiki/Q104248887|Danish men's national road cycling team 2021]] - has "DEN 2021" as an alias 411 times
** [[https://www.wikidata.org/wiki/Q191118|metric ton]] - should have "t" as alias in Latin script languages, "т" as alias for Cyrillic languages
* Scientific articles
** (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q13442814, as of now 42M): in many cases the same label is repeated in different languages (e.g. https://www.wikidata.org/wiki/Q27860672). Generally the original title is available (or a translation to English). Original non-English titles are frequently missing.
** in some cases, there could be articles with parallel titles in different languages (e.g. https://www.wikidata.org/wiki/Q59238742). One title for @en , one for @it,
**Solution:**
* 1st step: Adding the following new language codes and having other languages fall back to them
** mul
*** as a fallback for mul-<script> (mul-<script> -> mul -> en)
** mul-<script>
*** e.g. "mul-latn", "mul-cyrl", "mul-hans", "mul-hant"
* 2nd step: Community creates guidelines and help pages on how to use these.
** e.g. what if one Latin-script language may prefer a form (e.g. "Philip L. Brown"), another Latin-language script another form (e.g. "Philip Larry Brown" or "Philip Brown")
* 3rd step:
** We would start triggering Constraint Violations if someone wanted to add the "same" label value on a different language. (different script)
** We would start triggering Constraint Violations if someone wanted to re-add a label value in some language. (same script)
* 4th step: Some point in the future (probably when we can use the new termbox for this) we might implement a more intuitive UI for this.
**Open questions:**
* Could this solution somehow backfire?
* What are all the mul-<script> codes that we should start with? mul-latn seems tohe most frequent.
* How exactly should be the fallback chain for these mul codes?
* Can items still be found when no label is present in the language?
* Search results are currently (also) ranked by the number of labels, how to ensure ranking still works?
* Should the "mul-latn" label be displayed in a grayed out form when a description is present?
* How will this work in LUA infoboxes? how to ensure ranking still works.Currently users copy en labels to ca/cs/da/es/nb even when the fallback works.
* How to prevent that now empty label fields aren't filled with inappropriate label (loss of data quality)?
**Original report:**
This task is to add support for a "mul" language code for labels and aliases. For any benefits of this code to be properly reaped, all language codes should ultimately fall back to "mul"—which I believe would be achieved by adding it as a fallback for the "en" code.
(If it is more desirable, codes for "mul-latn", "mul-cyrl", etc. could be created, in which case e.g. only those codes using the Latin script would fall back to "mul-latn".)
Possibly related tasks: T258242 T256003 T43807