**Important:**
* Please deploy this only after the no-deploy weeks!
**Main components:**
* Wikidata language codes
**User story:**
As an Wikidata editor,
I want to avoid repeating identical labels in hundreds of languages if it is the same across languages. This will reduce
in order to reduce the amount of redundancy and the amount ofndant content that needs to be maintained on Wikidata.
**Problem:**
We have many labels that are by principle identical across different languages (see examples section). This has some bad consequences:
* editors having to create and maintain redundant content (copying the same thing to most/all languages creates massive amounts of edits and is a huge waste of resources)
* need of storing redundant information that burdens our systems (e.g. the Query Service)
**Examples:**Solution:**
Introduce a new language code that all languages fall back to. This will be particularly helpful for Unicode characters, Scientific articles, and Codes as well as for Names in Latin scripture (as we do not have an elaborate fallback system for that scripture yet). We will test if this solution (only one new language code) is good enough, or if we need more specific language codes after all to model a useful fallback chain.
//This task//
* Adding "mul" as a new monolingual language code.
* Have other languages fall back to it (Translatewiki fallback chain > "mul" > "en")
//Community takes over//
* Names* Community creates guidelines and help pages on how to use the new code, e.g.
** persons (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q5, as of now 9.2M) have in most cases the same label and the same aliases repeated in different languagesWhat if one Latin-script language may prefer a form (e.g. "Philip L. Brown"), another Latin-language script another form (e.g. https://www.wikidata.org/wiki/Q42 ."Philip Larry Brown" or "Philip Brown")?
** In what cases should the Latin-language label be used for "mul" instead of the native label (while still making sure that re-users can identify the native label via property)?
** given names and family names (https://w.wiki/3zWT, which counts Q202444 and Q101352 including subclasses, as of now 590k): in all cases the same label are repeated in different same-script languages, e.g. https://www.wikidata.org/wiki/Q21448867etc.
** astronomical objects (11M), the [[https://www.wikidata.org/wiki/Q74758893|galaxy "SDSS J151017.36+160605.3"]] - has "SDSS J151017.36+160605.3" as the label 411 times, Community gives feedback after some months about how the new code and guidelines work
** taxa (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q16521, as of now 3.1M) the [[https://www.wikidata.org/wiki/Q39898268|species "Neotrogla curvata"]] - has "Neotrogla curvata" as the label 411 times.Based on the feedback we might iterate on the approach if necessary.
//Ideas for the future//
* start to show a warning if someone wants to add the mul-label in a different language
* include the experience in a possible future solution for multilingual descriptions ([Abstract Descriptions](https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29))
* re-evaluate if the final fallback to “en” is still appropriate
**Examples:**
This will be useful in many different places:
* Unicode characters//Names//
* persons (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q5, as of now 9.2M) have in most cases the same label and the same aliases repeated in different languages, e.g. https://www.wikidata.org/wiki/Q42 .
* given names and family names (https://w.wiki/3zWT, which counts Q202444 and Q101352 including subclasses, as of now 590k): in all cases, the same labels are repeated in different same-script languages, e.g. https://www.wikidata.org/wiki/Q21448867.
* astronomical objects (11M), the [[https://www.wikidata.org/wiki/Q74758893|galaxy "SDSS J151017.36+160605.3"]] - has "SDSS J151017.36+160605.3" as the label 411 times,
* taxa (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q16521, as of now 3.1M) the [[https://www.wikidata.org/wiki/Q39898268|species "Neotrogla curvata"]] - has "Neotrogla curvata" as the label 411 times.
//Unicode characters//
* [[https://www.wikidata.org/wiki/Q87526860|Unicode character "♣"]] - has "♣" as the label and "U+2663" as an alias 446 times
* //Codes//
** [[https://www.wikidata.org/wiki/Q39|Switzerland]] - has "CH" as an alias 403 times
** [[https://www.wikidata.org/wiki/Q623|carbon]] - has "C" as an alias 187 times
** the [[https://www.wikidata.org/wiki/Q28006|disambiguation page "C"]] - has "C" as the label 104 times
** the [[https://www.wikidata.org/wiki/Q104248887|Danish men's national road cycling team 2021]] - has "DEN 2021" as an alias 411 times
* //Scientific articles
*//
* (https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q13442814, as of now 42M): in many cases the same label is repeated in different languages (e.g. https://www.wikidata.org/wiki/Q27860672).
** in some cases, there could be articles with parallel titles in different languages (e.g. https://www.wikidata.org/wiki/Q59238742).
**Solutio**Translatewiki fallback chain:**
* 1st step: Adding the following new * https://translatewiki.net/wiki/Translatewiki.net_language codes and having other languages fall back to thems#Fallback_language_(MediaWiki)
//Examples://
** mul **ami** > zh-tw, zh-hant, zh-hans
*** as a fallback for mul-<script> (mul-<script> -> mul -> en) zh-tw > zh-hant, zh-hans
** mul-<script> zh-hant > zh-hans
*** e.g. "mul-latn", "mul-cyrl", "mul-hans", "mulzh-hant"s > []
* 2nd step: Community creates guidelines and help pages on how to use these. **de-at** > de
** e.g. what if one Latin-script language may prefer a form (e.g. "Philip L. Brown"), another Latin-language script another form (e.g. "Philip Larry Brown" or "Philip Brown")de > []
* 3rd step:
** We would start triggering Constraint Violations if someone wanted to add the "same" label value on a different language. **en-gb** > en
en > []
* 4th step: Some point in the future (probably when we can use the new termbox for this) we might implement a more intuitive UI for this.
**Open questions:**
* Could this solution somehow backfire?*Hard-coded fallback chain:**
//old//
* Translatewiki fallback chain > "en"
//new//
* Translatewiki fallback chain > **"mul"** > "en"
**Acceptance criteria:**
* What are all the mul-<script>[] New monolingual language codes that we should start with? “mul” added to Wikidata
* How exactly should be the[] Hardcoded fallback chain in Wikidata changed to reflect new fallback chain for these mul codes?n (Translatewiki fallback chain > **"mul"** > "en")
**Original reportCommunity communication:**
* The interested Community needs to be aware of the new code and of the necessity to create guidelines and help pages on how to use it.
* We need to be available for the Community when they create guidelines and to collect feedback.
**Original:**
This task is to add support for a "mul" language code for labels and aliases. For any benefits of this code to be properly reaped, all language codes should ultimately fall back to "mul"—which I believe would be achieved by adding it as a fallback for the "en" code.
(If it is more desirable, codes for "mul-latn", "mul-cyrl", etc. could be created, in which case e.g. only those codes using the Latin script would fall back to "mul-latn".)
Possibly related tasks: T258242 T256003 T43807