Sun, Apr 21
Apr 6 2019
Mar 26 2019
Also, somewhat related, when rendering the map at https://maps.wikimedia.org/?lang=zh#15/10.6986/106.7430 , the name of openstreetmap relation 7157197 was rendered without an appropriate character, and resulted in a "tofu" empty box character instead of the character it's supposed to be rendered. The text are 茹𦨭县 which should be rendered as https://i.imgur.com/g3gfUew.png but it is currently rendered as https://i.imgur.com/S3AlPC2.png
Mar 21 2019
There were definitely >100 languages affected if language fallback didn't work in all languages, but if it's only for those that fallback to zh, and a few other languages that utilize script tag in Mediawiki project but utilize an alternative language tagging style in OSM projects, then the scope of the task could have been smaller than anticipated.
Mar 19 2019
See Also https://github.com/gravitystorm/openstreetmap-carto/issues/2208
And this also affect some other languages, see the link for further details
Is the system case-sensitive? Because the document use "zh-hans" and "zh-hant" while relevant keys in OSM are usually "zh-Hans" and "zh-Hant"
Mar 15 2019
Example link: https://maps.wikimedia.org/?lang=zh#12/24.4542/118.0906
Current text shown: https://i.imgur.com/dc5O9J2.png
Expected text shown: https://i.imgur.com/CqJmSlN.png
Jan 31 2019
Please consider the following email response given to @Liuxinyu970226 when they asked certain linguistic expert about their opinion on the matter: https://imgur.com/a/YT8bnzJ (I am not sure whether sufficient permissions have been obtained by the user for me to link the mail on the public internet but let's just look at it for now)
Well, as mentioned, the code cmg previously suggested as possible alternative is actually not appropriate according to email exchanges you have conducted with professors that know more about these terminology. And given the email exchange also confirmed that the current ISO language codes for Mongolian languages doesn't really make much sense either, it would also be wrong to use individual language code for such purpose. So following the convention already used by others should be the most sensible way to represent such text string in the wiki. But then, if certain member of Langcom stand firm on their position and unwilling to change, then no amount of sensibility can force them to change.
Nov 26 2018
Yes that could be a browser specific conversion as my Chrome browser is converting "ß" into "ss". Then again it shows that it is necessary to look at browser implementation on normalization instead of just standards.
Some characters, like "ß", "。", "｡" from the list linked above , would not normalize to become regular ASCII character even if the NFKC normalization is applied despite they can still be identified and converted into ASCII characters by browser. And the list was not extensive thus more characters could escape the NFKC normalization. Especially note worthy thing is that the two ideographic full stop would be treat as a dot by browsers and thus it can still be used to bypass blacklist of almost all url in spam blacklist.
And then for the longer term solutions, browsers may not follow standardized way in RFCs fully and might have their own way to normalize links for users and that might need to investigate behavior of each browsers
Nov 17 2018
@Liuxinyu970226 Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions
Nov 14 2018
There are already some websites that sue Chinese-based captcha around, however it is a really really bad idea to me personally. As someone who can read and write Chinese and is also a native language user of one of the Chinese languages, it's very complicated for me to type Chinese text into computer dependent on environment and the input method available. Worse case scenario I would have to use Google Translate or Google Search to find out those matching characters over the internet and then copy them over in order to finish a localized captcha challenges. Please DO NOT implement such troublesome thing.
Nov 11 2018
Nov 5 2018
Nov 1 2018
Oct 4 2018
@Popolon I believe Monguor and all that do not/no longer use Mongolian Script in writing so that's not really relevant to the context.
Oct 2 2018
@Popolon According to my understanding assuming they are correct understanding, using Arabic as analogy, what you propose would be like making different monolingual value for "Libyan Modern Standard Arabic", "Egyptian Modern Standard Arabic", "Tunisian Modern Standard Arabic". Yes, Libyan/Egyptian/Tunisian Arabic are all different and could be considered as different languages, however there are only one single literary standard here. Surely, there are different phonetic literary standard that more closely reflect individual languages, like the Cyrillic alphabet being used to spell different Mongolic languages, which would warrant the establishment of wiki in each of their individual languages, however there are only one Classical Mongolian Script just like there are only one Modern Standard Arabic. You can say mvf is closest to Classical Mongolian in the same way as Egyptian Arabic being closest to the standard of Modern Standard Arabic, however they are not equal.
Sep 29 2018
Sorry for late reply,
@Liuxinyu970226 If the concern of ISO639's RA is "users of the codes understand that part 2 of the standard has a code that includes several coded languages in part 3.", then probably what can be done is ask for cancellation of the mvf code and khk code in the ISO639-3?
Sep 20 2018
I think I am able to edit lead section on mobile mediawiki site without js enabled using the js-less editor, because I have just tried that on English Wikipedia and it seems like it is working?
Sep 19 2018
It's also an annoyance to me as well as my current user name is supposed to start with a small letter
All these characters can also be typed into Wikipedia as part of a url, bypassing blacklist, and then get accepted by browsers which would convert them into basic ascii alphanumeric characters, and then send users to the blacklisted webpage.
Aug 30 2018
Actually my original ticket could be a little clearer...
Like clarifying that the "example" there was meant to mean there are articles in cdo/nan/hak wikipedia that are written in alternative script and thus there should be related monolingual code that would allow recording of those article names in wikidata language field.
Thus I would like to bump the request for monolingual language code cdo-hani and nan-hani.
And then for hak... Can someone verify that "Hakka (Traditional Han script)" and "Hakka (Simplified Han Script)" are proper way to describe how Hakka speakers would write their language in Han scripts?
Aug 29 2018
- If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?
- Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.
- Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?
- Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?
Aug 20 2018
The wikidata property proposal https://www.wikidata.org/wiki/Wikidata:Property_proposal/coordinate_location_GCJ02 would depend on this property datatype.
If there's no way to fix the internationalized format now then please change the format into ISO date format as a temporary fix. There's currently no way for me to tell which day a date value actually represent without trying to edit it and see the calendar pop up.
May 6 2018
Is it possible to use language converter with specific conversion group on most messages and correct them only when language converter get it wrong?
Feb 24 2018
Dec 11 2017
If I understand correctly, while this api should be internal to applications and not used by user, however it'd be used by things like mobile clients, visual editor, content scrapper, and such to obtain information for user's viewing which might still subject to some of the limitations I mentioned above?
Remove Japanese Kyujitai request as might be using variant subtag instead of script subtag could be a better idea? Although there are also problems in using variant subtag
Dec 10 2017
Remove Nushu as use case related to the language and script can be covered by using monolingual code mis due to the lack of language code for Tuhua
Dec 9 2017
As mentioned by others, it've been almost a decade since the issue was raised
Those "raising" actions are illegal, please see Bug management/Phabricator etiquette, especially:
Report status and priority fields summarize and reflect reality and do not cause it. Read about the meaning of the Priority field values and, when in doubt, do not change them, but add a comment suggesting the change and convincing reasons for it.
Sorry, a better term would be, "since the issue was submitted".
Is it possible to do the renaming task for those wikis as of current status first and then deal with whatever bugs that would appear after the renaming is to be done? As mentioned by others, it've been almost a decade since the issue was <del>raised</del>submitted, and there will be more and more legacy issue need to deal with the longer it drags on (CX didn't even exists back in the day). Things like CX would be broken but those seem to be less important.
Accept-language header seems to be a bad idea for situation when the language or variant is not usually selectable in browser or terminal setting (Like you can't pick anything yue in chrome language setting page. Not in Microsoft Windows setting either which IE seems to read from there. Not on smartphone setting which mobile browser and app read from either.) So there are no way for user to configure these client softwares to send accept language header in language/variant that they would like to use to the server.
[Note: This is relevant as there are request to implement Hans-Hant conversion for yue.wp too]
[Note 2: It can be a way to detect what variant the user initially want, but probably not a good way to fixate the variant selection based on this]
- The linked discussion was about Hanmun = Classical Chinese documents, not documents written in Hanja-Hangul mixed script.
- The task is closed as it seems like it is not a good way to word this request this way for now. Will probably make a post on the community discussion page when I word it in a better way.
Dec 5 2017
Dec 2 2017
Nov 27 2017
hum edited task description accordingly
Nov 26 2017
Almost all the Hani text being discussed and used related to the nan.wp project now are Hant. Disregarding Hans for now and use Hani instead of Hant would probably do the job in the current setting but what about when mainland China Hans users start visiting and editing the site?
Nov 25 2017
Nov 23 2017
It's now working on my end.
Because language converter would need to cater to exceptions, the better way to do this is probably just open up a language and then only translate those that are different and then let the other terms fallback to Tagalog just like how zh_HK<>zh-Hant have been handled.
Existing syntax for special conversion rule have been documented at https://www.mediawiki.org/wiki/Writing_systems/Syntax . Some of the concern could have been addressed in the link and there are also no need to reinvent new syntax instead of using a syntax that have already established.
- British<>American English, Portuguese variants conversion
- Different options to enable/disable the system in various way, with additional user settings that allow custom rules and different separated set of rules
- Some words on language converter in editing mode
- sentence-based conversion tool
- Classical Chinese Kanbun conversion
- Mutliple parallel conversion
Instead of "always Simplified Chinese", a more proper description would be "always the language variant in the article source". The conversion between Simplified and Traditional Chinese variant on Chinese Wikipedia module is achieved by Language Converter. The language converter does not work in source editing mode and does not work in Content Translation either.
Nov 17 2017
What is the rationale of macrolanguage being not usable to identify text?
Are you implying that those monolingual language code I'm submitting does not represent anything useful? nan/cdo/hak-Hant/hans are language-script combinations being used to write wikipedia articles, and vi-hani, ko-kore, ja-Kyujitai are used to name people and things in respective countries. How do you write the name of "Ho Chi Minh City" in Vietnamese Han nom? The only place providing this info in wikidata for now is in the Japanese alias for the entry name. How about "Kim Jong-Il" in ko-Kore? Look at the Slovak alias. Is it better than having labels for each of these script variants?
Then mon, mon is ISO 639-3
Nov 13 2017
Nov 12 2017
mvf only refer to Mongolian spoken in Central part of Inner Mongolia while mn-Mong is written by all mn users.
Apr 7 2017
According to some pages I have read from google, it seems like in the US only the compilation of data is protected while data itself are not and the creation of databasr also need to have some creativity in order to make the database fulfil copyright law, and in the EU there is an extra protection of investment being put to collect, arrange and present data. So it seems like it should not have problem under the US law in most cases although it might be better to let a legal expert to answer the question ..
Mar 2 2017
Is it within the scope of this task that ordinary wikipedia with multiple page for every single concept written in multiple script cannot be linked to same wikidata concept entry?
Dec 21 2016
adding the tag because there're intention to make lzh wikipedia text run vertically
Oct 8 2016
Oct 5 2016
@GerardM but traditional mongolian script is like literary chinese, which is universal to every languages that were using it as their written form and thus it is invalid to say which language they belong to. Just like you can say Nihon Shoki is written in Chinese but you can't say it is written with Mandarin or Hakka. The situation with traditional Mongolian script is the same. And also, it would be incorrect [despite being a convention] to call those Mongolian text middle/classical Mongolian language just like you can't equate literary chinese to old/middle chinese, as there are still some changes being made to the written language that set the old language at that time apart from the written form continually being used.
Oct 4 2016
0. According to the "Requirements for a new language code" linked above, the WIP requirement for a new language code is a valid IETF tag not a valid ISO code
- Macrolanguages in ISO 639-3 are still individual languages in ISO 639-2, and definition of macrolanguage in ISO 639-3 is "clusters of closely-related language varieties that [...] can be considered distinct individual languages, yet in certain usage contexts a single language identity for all is needed". and thus macrolanguages should be treated as an lanuage with valid language code. And mn is a valid code and is currently used by Mongolian wikipedia, which also contain several articles written in traditional Mongolian script.
- See BCP 47 section 2.1.1 for details about uppercasing. https://tools.ietf.org/html/bcp47
- both khk, mvf, bua and xal can be written with Latn, Cyrl and Mong.
- mn-Mong is not only used for mvf.
- BCP 47 also stated that macrolanguage code can still be used instead of code for encompassed languge
- you can see mn_Mong_CN is a likely subtag in http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/likely_subtags.html
- You can see mn-Mong listed in IANA language subtag registry http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry (listed as redundant as it have the correct form and format defined by RFC4646 and all the subtag it used are defined in the document. See RFC4645 for detail.)
Jul 8 2016
@Roytam1 There are no such thing known as unified font in the world at least as of now, and they are not supposed to be unified. Alternative to region based font would be a font that are designed according to a particular regional standard or according to font developer's habit. See Unicode's FAQ about CJK for further detail. I am not familiar with server environment nor the software's setting, but in home environment when an application use a multiregion font without specifing a region, the result is often default to China's standard.
May 25 2016
From what I was told, many articles on Cebuano Wikipedia as well as some other Wikipedia with very high article-per-speaker ratio used bots to create articles from database, for instance [just for example] those bots could create a million article for 1st to 1 millionth asteroid automatically just by copying from database according to a user defined format. See https://ceb.wikipedia.org/w/index.php?limit=50&title=Espesyal%3AMga+Tampo&contribs=user&target=Lsjbot&namespace=&tagfilter=&newOnly=1&year=2016&month=-1 for example. I don't think this should be taken into consideration about what language would be useful to visitors.
Mar 26 2016
Mar 14 2016
Is T353 a subtask or a duplicate of this task?
Mar 2 2016
ah the official wikipedia android app 2.1.141-r-2016-02-10 as well as 2.0-r-2015-04-23
Feb 23 2016
CSS3 Vertical writing mode is now supported by 90%+ browsers around the world and supported by basically all non-opera-mini browser as per http://caniuse.com/#feat=css-writing-mode , also note that due to the limited support provided by mediawiki on vertical script, there're already some ppl created their own non-wikimedia mediawiki site by using their own hack/method.
Aug 6 2015
Jul 17 2015
bz9123000 should be part of the list too.
Jul 15 2015
I have just read some notations like bz19986, bug #19986 or bug 19986 in some older issues that are referring to issue number in bugzilla, is it possible for phabricator to automatically redirect all these to their issue # in phabricator, lile T21986 in this case?
Jun 5 2015
While the problem have not been resolved yet and EasyTimelines are still displaying witgiut text, I'd like to mention that according to http://wenq.org/wqy2/index.cgi?HanziStyles the wqy font is using China version's glyph. Once after the issue is fixed, should it also install another font that come with Taiwan version's glyph for users browsing the wikipedia in zh-tw/hk/mo, and is it technically viable ti display different font for people requesting different edition of the page? (Actually, should i file another bug for this?)
I don't think unifont should be use instead as according to http://wiki.debian.org.hk/w/Fonts it look like a wqy's font is an improved version over unifont in term of Chinese support.
Feb 14 2015
Some other links in wikipedia in general also act in this way, like links generated via Template:link on English Wikipedia.