Page MenuHomePhabricator
Search Advanced Search
    • Task
    CldrCurrency (in file CldrCurrencies.php, ugh) doesn't seem to be used anywhere... Added in {ff17395c28583b94c00ba9df3f36be66dcc7a21b} Is it still needed? Is it used by FR?
    • Task
    Seen when updating from current version 38 to version 39... `nb` and `no` are causing problems From @Nikerabbit on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/cldr/+/692597 >This method seems to be rewriting CLDR codes to MediaWiki's expected values, so it looks correct. But maybe there is also `nb` in cldr? That would complicate things considerably.
    • Task
    If you go [[ https://mni.wikipedia.org/wiki/%EA%AF%91%EA%AF%88%EA%AF%9F%EA%AF%85%EA%AF%95:RecentChanges | Special:RecentChanges on the Meetei (Manipuri) Wikipedia ]], you'll see all the numbers in timestamps and added/removed bytes in the Meetei Mayek script: ꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹. This works as expected. If I go to [[ https://translatewiki.net/wiki/Special:RecentChanges?uselang=mni | Special:RecentChanges with Meetei (mni) UI ]], I see timestamps in the Meetei Mayek script (꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹), but bytes added/removed in the Bengali script (০১২৩৪৫৬৭৮৯). This is not as expected: Everything is supposed to be in Meetei Mayek. Something similar happens also on Special:Contributions, in search results, and on [[ https://translatewiki.net/?uselang=mni | Twn Main Page ]]. On the latter, in particular, the statistics on the large top boxes appear in Bengali numerals, and the percents in the small boxes at the bottom appear in Meetei numerals. Another thing that may be related is that I also noticed that in the translatewiki's sidebar, two lines appear in a mix of Manipuri and Bengali script: * মৈতৈলোন্ ꯒꯤ ꯊꯣꯡꯒꯥꯜ * মৈতৈলোন্ ꯗꯥ ꯍꯟꯗꯣꯛꯄ "মৈতৈলোন্" is "Meiteilon", the name of the Meetei language in the Bengali script. The rest of the line is in the Meetei script. Perhaps the language name is taken from CLDR, and CLDR gives data in the Bengali script by default? The Bengali script is indeed used for the Manipuri language, but all the content in translatewiki and the Meetei Wikipedia is in the Meetei script. These regular expressions may be useful in debugging: * Bengali digits: [০-৯] * Meetei digits: [꯰-꯹] Tagging #mediawiki-extensions-cldr because I suspect it's related. Please remove it if it's not related.
    • Task
    Currently information about languages is scattered in many places (https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Representation_of_languages) and CLDR extension only makes it worse (cf {T190129}). Also this extension provides no user-facing features by itself. Proposed: * We create a PHP library directly derived from CLDR data * Local overrides and names of languages not in CLDR may either moved to MediaWiki core, language-data or the new library created above If above results in one new library, it is essentially a #Librarization of CLDR extension.
    • Task
    `CldrCurrency` (in `CldrCurrencies.php`) was added by @K4-713 back in 2013 in {ff17395c28583b94c00ba9df3f36be66dcc7a21b}. Unless I'm being blind... I can't see where the PHP class is used at all. Does FR use it for something? If not, can we delete it?
    • Task
    The English names for several languages include a LRM mark as they just replicate the native name (which is also incorrect in the first three cases), or just render the native name (autonym): * [es-formal] = "español (formal)‎" (no LRM needed even in native Spanish!) – must be: "Spanish (formal)" in English * [hu-formal] = "magyar (formal)‎" (no LRM needed even in native Hungarian!) – must be: "Hungarian (formal)" in English * [nl-informal] = "Nederlands (informeel)‎" (no LRM needed even in native Dutch!) – must be: "Dutch (informal)" in English * [gsw] = "Alemannisch" – should be "Alemannic" in English * [sty] = "себертатар" – must be "Northern Tatar" in English * [vo] = "Volapük" – should //probably// be "Volapuk" in English (without the combining diaeresis) * [vro] = [fiu-vro]= "Võro" – should //probably// be "Voro" in English (without the combining tilde) The following test page also HTML-encode the spaces to makes sure they are not duplicated in the middle (but this is not dramatic and not signaled as an error) https://commons.wikimedia.org/wiki/Module_talk:Multilingual_description/sort/testcases As a general rule, the English names of all languages should be plain ASCII only (of course, this does not apply to other translations or native names)... This is also checked on the same test page where you can see the red cells in the last column) using the following basic regular expression: /^[A-Z][ '()%-/0-9A-Za-z]*['()%-/0-9A-Za-z]$/ The reason for that is that the English names of languages is used in contexts where only ASCII is expected (spaces, parentheses, hyphens, single quotes, and slashes are still possible; applications are generally aware if these ASCII punctuations or spaces have to be replaced; decimal digits may occur in the name of some variants, like a year for an orthographic reform, but they generally don't cause problems) Yellow cells on the test page just signal cases where the autonym and the English name are identical (not necessarily an error, but it may indicate a missing translation, either in English or in the native name; some of these cases are OK like "Esperanto", whose autonym is correctly capitalized for that language). Note the LRM/RLM marks should not be used at all in any language * For the few languages that display two native names in different scripts (when we don't specify the script variant), the solution is to write the LRM name first then the "/" then the RLM name. * For correctly formatting lists of languages (showing their autonyms), the solution is to use Bidi isolation ("bdi" element in HTML, or the equivalent "bidi-isolation:isolate" in CSS) for each item in the multilingual list. LRM/RLM are deprecated (they are not isolates, but deprecated overrides). See T252568. Isolates is the recommandation in the second version of the UBA (published many years ago) that was made to replace and deprecate all overrides (including the "bdo" HTML element, and RLM/LRM controls that was the only solution in the first version of UBA and in HTML4). ---- In all cases, any //trailing// RLM or LRM without any known character after it is wrong: their use should be limited to just very specific characters where one wants to change its weak or strong directionality or its mirroring, for a context of use within a text with known language/script (for example to change the strong directionality of Latin letters or digits in an Arabic text). Such use of Bidi-overrides is very exceptional and only needed inside very specific names (like some brands/trademarks using these characters as if they were normal Arabic letters, or for uncommon notations of numbers when an arabic text wants to present these numbers with a strong RTL direction, instead of their default LTR direction, opposed to the normal direction or reading; note that even Arabic and Persian digits are LTR, as they are written starting from most significant digit to the left and then other digits in backward reading order...). A specific context allows borrowing Hebrew letters in Latin texts and treat them as if they were a Latin letter with string LTR direction: LRM is then useful before that Hebrew letter only (is is found in some Latin names borrowing an Hebrew Aleph, but not needed for maths where there's a Aleph mathematical symbol which is already LTR) The other case for using Bidi-overrides is for historic texts when a script was using their current modern direction (e.g. boustrophedon, or old Greek and Coptic written RTL). For such cases, "bdo" is still the best solution to embed a full line, and there's still no real need of RLM/LRM for just a single character except to force its mirroring (e.g. an arrow). ----- Request for patch of LocalNamesEn.php per comment T256649#7160228 below: * [es-formal] = https://www.wikidata.org/wiki/Q64427343 - "Spanish (formal)" * [hu-formal] = https://www.wikidata.org/wiki/Q64427347 - "Hungarian (formal)" * [nl-informal] = https://www.wikidata.org/wiki/Q64427356 - "Dutch (informal)" * [sty] = https://www.wikidata.org/wiki/Q4418344 - "Siberian Tatar"
    • Task
    Steps to Reproduce: # install wikimedia 1.34, wikibase 1.34 and ULS 1.34 in line with https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector#Installation # install CLDR extension 1.34 following instructions on this same weblink # run the php UniversalLanguageSelector/data/LanguageNameIndexer.php command Actual Results: Terminal gives result: <domain>/w/extensions $ php UniversalLanguageSelector/data/LanguageNameIndexer.php Bucket stats: - 740 buckets - smallest has 1 entries - largest has 2874 entries - median size is 13 entries - average size is 69.875675675676 entries When I verify that langnames.ser file gets generated in ULS/data/ folder, in line with instructions on above webpage, this file is absent. Cross-language searching is not possible on my wiki-repo. Expected Results: langnames.ser file gets generated in ULS/data/ folder cross-language searching becomes possible on my wiki-repo
    • Task
    ISO 639 language code "mul" is reserved to mean "multiple languages" (see https://en.wikipedia.org/wiki/ISO_639-3#Special codes). So, Russian translation should be saying "несколько языков" (literally "several / multiple languages") instead of the current "языки разных семей" ("languages of various / multiple [language] families"). The proposed translation is aligned to * TranslateWiki (https://translatewiki.net/w/i.php?title=MediaWiki:Centralnotice-multiple-languages/ru ; see also https://translatewiki.net/w/i.php?title=MediaWiki:Centralnotice-multiple-countries/ru ). * Wikidata ( https://www.wikidata.org/wiki/Q20923490?uselang=ru ) ---- Update June 21, 2021: * Given the potential for misunderstandings at Wikidata, this should be corrected locally in priority. Please add to LocalNamesRu.php: * mul - "несколько языков"
    • Task
    The files are here: https://phabricator.wikimedia.org/diffusion/ECLD/browse/master/LocalNames/ Currently we have to modify the repository manually, like {T162406}
    • Task
    In Wikidata on items, the left column of the description area (where labels, descriptions and aliases are added) shows the language. With my settings for Dutch, most of the languages there are in Dutch, but some are not and shown in English instead. (expand [[ https://www.wikidata.org/wiki/Q56427997?uselang=nl | example ]] + click All entered languages/Alle ingevoerde talen) Two examples are: * pt-br Brazilian Portuguese -> Braziliaans-Portugees //(submit this to CLDR)// * be-tarask Belarusian (Taraškievica orthography) -> Wit-Russisch (Tarasjkevitsa orthografie) //Can be added to LocalNames// Where can I change this? Or can this be changed? ----- Review June/July 2021: Tasked noted on [[https://www.wikidata.org/wiki/Wikidata:De_kroeg#language_names_in_Dutch | Wikidata:De_kroeg#language_names_in_Dutch]] (June 15) Other items that may be worth checking: [[https://www.wikidata.org/wiki/Q7411?uselang=nl|Q7411]], [[https://www.wikidata.org/wiki/Q55?uselang=nl|Q55]] Other missing languages to add: * abq Abaza ->Abazijns * abs Ambonese Malay -> Ambonees * aeb-arab Tunisian Arabic (Arabic script) -> Tunesisch Arabisch (Arabisch schrift) //Can only be added to LocalNames// * aoc Pemon -> Pemón * azb South Azerbaijani -> Zuid-Azerbeidzjaans //Can only be added to LocalNames// * bcl Central Bikol -> Centraal-Bikol * bfi British Sign Language -> Britse gebarentaal * bjn Banjar -> Bandjarees * bqi Bakhtiari -> Bachtiarisch * bsk Burushaski -> Burushaski * byq Basay -> Basay * bxr Russia Buriat -> Russisch-Boerjatisch * bzg Babuza -> Babuza * bzs Brazilian Sign Language -> Braziliaanse gebarentaal * cal Carolinian -> Caroliniaans * ckv Kavalan -> Kavalaans * cnr Montenegrin -> Montenegrijns * de-at Austrian German -> Oostenrijks Duits //(can also be submitted to CLDR)// * de-ch Swiss High German -> Zwitsers Hoogduits //(can also be submitted CLDR)// * dlm Dalmatian -> Dalmatisch * eml Emilian-Romagnol -> Emiliaans-Romagnools * en-gb British English -> Brits-Engels //(can also be submitted CLDR)// * en-ca Canadian English -> Canadees-Engels //(can also be submitted CLDR)// * fr-be Belgian French -> Belgisch-Frans * gcf Guadeloupe-Martinique Creole French -> Guadeloups Creools * kae Ketagalan -> Ketangalaans * lat-vul Vulgar Latin -> Vulgair Latijn * sr-ec Serbian (Cyrillic script) -> Servisch (cyrillisch schrift) //Can only be added to LocalNames// * sr-el Serbian (Latin script) -> Servisch (Latijns schrift) //Can only be added to LocalNames// * tt-cyrl Tatar (Cyrillic script) -> Tataars (cyrillisch schrift) //Can only be added to LocalNames// * tzl Talossan -> Talossaans * wls Wallisian -> Wallisiaans * yrk Nenets -> Nenets * xpu Punic -> Punisch Changes are either in LocalNamesNl or CLDR. For the later, one can contact them directly. An overview is at [[https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all/in_nl|Help:Wikimedia_language_codes/lists/all/in_nl]]
    • Task
    CLDR 35 was released in March 2019: http://cldr.unicode.org/index/downloads/cldr-35
    • Task
    In several places, the English name for [[https://www.wikidata.org/wiki/Q33357|Qırımtatarca]] is given as "Crimean Turkish", "Crimean Tatar" is the preferred name in English. The English Wikipedia page on the language has "[[https://en.wikipedia.org/wiki/Crimean_Tatar_language|Crimean Tatar language]]" as its title (noting "also called Crimean Turkish"). [[ http://www.ethnologue.com/16/show_language/crh/ | Ethnologue ]] lists "Crimean Tatar" as the main name, as does the [[ http://id.loc.gov/authorities/subjects/sh85034019.html | Library of Congress Subject Headings ]]. Two places it occurs include "Languages" links on English Wikipedia (e.g., on the [[ https://en.wikipedia.org/wiki/Crimean_Tatar_language | Crimean Tatar language ]] enwiki page) where the tooltip for "Qırımtatarca" says "Crimean Turkish", and in the widget that let's you search for languages, where "Crimean" auto-completes to "Crimean Turkish" and "Crimean Tatar" doesn't match anything. The primary variant name should be "Crimean Tatar" (e.g., on the Qırımtatarca link under "languages") in both those places and elsewhere. It would be nice if both "Crimean Tatar" and "Crimean Turkish" worked in the search widget, but if there can be only one, then "Crimean Tatar" is the one. See also T240350.
    • Task
    Many languages names in UploadWizard are not translated to Ukrainian. {F8926388} Originally reported at https://commons.wikimedia.org/wiki/Commons:Help_desk#Translations_of_languages_names_in_Upload_Wizard by @Tohaomg
    • Task
    13 Months ago I asked CLDR to support South Azerbaijani language, see [[http://unicode.org/cldr/trac/ticket/8437|ticket:8437 in CLDR]] and it was added under the same code as ethnologue we use here for Wikipedia, see [[http://unicode.org/cldr/trac/browser/trunk/seed/main/azb.xml?rev=11654|rev 11654 in CLDR]], It turned out Shervin Afshar who is one of CLDR commit accessee, is one of Persian Wikipedia's users aka User:Shervinafshar, once a discussion started on Persian Wikipedia [[https://fa.wikipedia.org/w/index.php?oldid=15440914#.DA.A9.D8.A7.D8.B1.D8.A8.D8.B1_.D8.B7.D8.B1.D8.AF_.D8.B4.D8.AF.D9.87_.D9.88_.D9.81.D8.B9.D8.A7.D9.84.DB.8C.D8.AA.E2.80.8C.D9.87.D8.A7.DB.8C.DB.8C_.D9.85.D8.B4.DA.A9.D9.88.DA.A9|see]] and being called a secessionist by User:Behaafarid, shervin changed code to az_Arab, thanks to @Nikerabbit and @Reedy for fixing rebuild.php in caf6312ab36dec44977ac61a9b52d04eb58ba472 from CLDR extension but for an example check out [[https://phabricator.wikimedia.org/diffusion/ECLD/change/master/CldrNames/CldrNamesCkb.php;caf6312ab36dec44977ac61a9b52d04eb58ba472|CldrNames/CldrNamesCkb.php]] translations for the language name still stands in az-arab, either need to fix that or change the code, but also conflict codes ends up nothing in major translator engins. Thanks.
    • Task
    Hi. I'm probably more than half a year waiting for the translation. please correct this error http://i57.fastpic.ru/big/2013/1117/8d/f0f4409164bfb947c0529fcc17e6c78d.png thank. -------------------------- **Version**: unspecified **Severity**: normal
    • Task
    Right now, MediaWiki has no way to localize '1st', '2nd', '3rd', etc. The CLDR database includes rules for creating ordinal numbers but these rules are not currently extracted or utilized by the cldr extension. The rules are ruleset type="digits-ordinal-indicator" in core/common/rbnf. Here is a sample for English: <ruleset type="digits-ordinal-indicator" access="private"> <rbnfrule value="0">th;</rbnfrule> <rbnfrule value="1">st;</rbnfrule> <rbnfrule value="2">nd;</rbnfrule> <rbnfrule value="3">rd;</rbnfrule> <rbnfrule value="4">th;</rbnfrule> <rbnfrule value="20">→→;</rbnfrule> <rbnfrule value="100">→→;</rbnfrule> </ruleset> -------------------------- **Version**: unspecified **Severity**: enhancement
    • Task
    **Author:** `romaine.wiki` **Description:** Currently we can use {{#language:xx}} to insert the language name in pages, and {{#language:xx|en}} to show language xx in English. Please make such also available for countries, like {{#country:XX}} and {{#country:XX|yy}} with XX replaced by the ISO 3166-1 code of the country and yy the language it should be shown in. The data is already present in CLDR. -------------------------- **Version**: unspecified **Severity**: enhancement
    • Task
    Spun off form bug 34219 It would seem for cases like PT, there is a base language file, and then some variant (is that the right word?) overrides for PT itself, and for pt-br Not using the base pt.xml means that the Language name list for Portuguese is very sparse in it's entries, as it only contains the "overrides", not including the base data It wouldn't surprise me if this is also the case for other languages (In reply to comment #6) > (In reply to comment #3) > > For your information, the language names are taken from CLDR, which contains > > relatively few language names for Portuguese: > > http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/cldr/CldrNames/CldrNamesPt.php?view=markup > > I downloaded the core.zip file from CLDR and inside common/main, the pt.xml > file contains more than 500 language names. It's possible that you're probably > using just the pt-PT.xml, which seems to just contain the exceptions to the > base pt.xml file. (In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #3) > > > For your information, the language names are taken from CLDR, which contains > > > relatively few language names for Portuguese: > > > http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/cldr/CldrNames/CldrNamesPt.php?view=markup > > > > I downloaded the core.zip file from CLDR and inside common/main, the pt.xml > > file contains more than 500 language names. It's possible that you're probably > > using just the pt-PT.xml, which seems to just contain the exceptions to the > > base pt.xml file. > > Similar looks to be for pt-br > > It would seem that cldr doesn't honour fallbacks then... ie It should pull in > pt.xml, and then overwrite any duplicate keys with from the pt-pt file to > CldrNamesPt.php and pt-br file to CldrNamesPt_br.php -------------------------- **Version**: unspecified **Severity**: normal