Page MenuHomePhabricator

Georgian words are automatically (incorrectly) capitalized when entered
Closed, ResolvedPublic5 Estimated Story Points

Description

Steps to reproduce

(added by @Urbanecm)

  1. Log in to Wikidata.org and change your language to ka
  2. Ensure kawiki is in ka for you, if not, switch it into ka
  3. Go to any page that isn't connected to Wikidata item
  4. Click on "ბმულების დამატება"
  5. In displayed form, fill "enwiki" into first field
  6. Fill "ბმულების" into the second field
  7. See attachments for my output.

Screenshot from 2018-11-05 12-51-04.png (333×551 px, 57 KB)

Expected behaviour

It shouldn't definitely display any squares.

Original description

Hi to all!

I am Mehman, from Georgian Wikipedia. So, for almost a month now we are faced with the problem that Georgian inscriptions are displayed on the screen as squares (see: https://ka.wikipedia.org/wiki/ვიკიპედია:ქალები,_რომლებსაც_არასოდეს_შეხვედრიხართ_2018), but unfortunately the problem is not only this, we also cannot specify Interlanguage links in browsers (Chrome, Mozilla). As I myself understood, the problem is that Unicode changed the parameters of Georgian letters, so, mostly you do not need to write Georgian words with a capital letter, all the letters in the word must be the same size.

Please deal with the problem as quickly as possible, because it really provides a great deal of discomfort for us, as for active Wikipedians.

Thanks in advance!


See Also:

Related Objects

Event Timeline

It looks like font issue to me, which is always not server side, but client side.

If the problem was personal, then we could suggest a problem with the font issue, but the problem is with the whole community. In order to explain to you in detail problem, I ask everyone to do the following operation:

  1. open the page: https://ka.wikipedia.org/wiki/სტამბოლის_ახალი_აეროპორტი
  2. try to indicate there Interlanguage links
  3. to English page: https://en.wikipedia.org/wiki/Istanbul_New_Airport

I think you yourself will understand the essence of the problem.

P.S. You can try on all browsers (Chrome, Mozilla and etc.).

@Urbanecm can you tell which browser was used?

This comment was removed by Mehman97.

w6dkvWV.png (580×717 px, 88 KB)
cdEi6La.png (565×601 px, 50 KB)

After one hundred more Georgian letters were added in Unicode, the uppercase commands used in programming already operate on Georgian. This means that sooner or later only English little letters are changing, now the Georgian is changing too. Because these orders did not work for Georgian, programmers did not pay attention to all these and what they used for English, they left the same thing in Georgian, including titlecase/capitalization instructions, which enhanced the first letters in words that are not grammatically in Georgian.

All this leads to problems with visual editor and automatic translation, as well as searching links in the article, adding interviews, etc.

For example, the Visual Editor changes the title of the first letter of the article and changes it by another letter. In addition to the automatic translation, the first letter of all links inserted on the page, and in the final article, none of the links are working and the article is incorrect.

On the left, editing languages ​​links can not be attached to similar articles in other languages ​​of the article, as it does not have an article, rather than the current name of the article.

In the case of adding a file or links, the first letter or the other is looking for or there is no such article.

Aklapper changed the task status from Open to Stalled.Oct 27 2018, 6:19 PM
Aklapper raised the priority of this task from High to Needs Triage.

Resetting priority.

Please tell us which browser(s) and operating systems you have tested this with, and follow https://www.mediawiki.org/wiki/How_to_report_a_bug to clearly explain 1) clear steps to reproduce the problem, including links, click by click, 2) what you currently see and 3) what you expect to see, so we (who do not speak Georgian) don't have to guess. All this in separate sections. Thanks!

we also cannot specify Interlanguage links in browsers (Chrome, Mozilla)

Please file separate tasks for separate problems. One task for one problem. Not sure as there are no steps to reproduce, but maybe that's T208077?

After one hundred more Georgian letters were added in Unicode, the uppercase commands used in programming already operate on Georgian. This means that sooner or later only English little letters are changing, now the Georgian is changing too. Because these orders did not work for Georgian, programmers did not pay attention to all these and what they used for English, they left the same thing in Georgian, including titlecase/capitalization instructions, which enhanced the first letters in words that are not grammatically in Georgian.

All this leads to problems with visual editor and automatic translation, as well as searching links in the article, adding interviews, etc.

For example, the Visual Editor changes the title of the first letter of the article and changes it by another letter. In addition to the automatic translation, the first letter of all links inserted on the page, and in the final article, none of the links are working and the article is incorrect.

How is that related to letters displayed as squares? None of the screenshots show any squares?

Resetting priority.

Please tell us which browser(s) and operating systems you have tested this with, and follow https://www.mediawiki.org/wiki/How_to_report_a_bug to clearly explain 1) clear steps to reproduce the problem, including links, click by click, 2) what you currently see and 3) what you expect to see, so we (who do not speak Georgian) don't have to guess. All this in separate sections. Thanks!

I (and most of the participants) use Windows system, I tested both Chrome and Mozilla, the effect is the same when specifying Interlanguage links. Unfortunately, I don’t know much about the work in Phabricator, so I didn’t consider and don’t know how to work here enough and therefore I ask for your help.

After one hundred more Georgian letters were added in Unicode, the uppercase commands used in programming already operate on Georgian. This means that sooner or later only English little letters are changing, now the Georgian is changing too. Because these orders did not work for Georgian, programmers did not pay attention to all these and what they used for English, they left the same thing in Georgian, including titlecase/capitalization instructions, which enhanced the first letters in words that are not grammatically in Georgian.

All this leads to problems with visual editor and automatic translation, as well as searching links in the article, adding interviews, etc.

For example, the Visual Editor changes the title of the first letter of the article and changes it by another letter. In addition to the automatic translation, the first letter of all links inserted on the page, and in the final article, none of the links are working and the article is incorrect.

How is that related to letters displayed as squares? None of the screenshots show any squares?

When in Georgian Wikipedia we click on the addition of Interlanguage links

FireShot Capture 11 - საფრანგეთის ეროვნული ბიბლიოთეკა - ვიკი_ - https___ka.wikipedia.org_wiki_%E1%.png (667×1 px, 213 KB)
, and indicate the article to which we want to link it, then automatically the first letter of the Georgian article turns into a square, and the program writes that the corresponding article does not exist.

Framawiki subscribed.

This last screenshot tell me that it's something with Wikibase.

@Mehman97: If I click on ბმულების რედაქტირება under სხვა ენებზე then I go to wikidata.org. I don't get a dialog like in the screenshot in your last comment. So I cannot reproduce the situation.
As asked several times before already, this task needs clear steps, step by step, click by click, as a list of steps; and what you expect to see; and what you see instead.

Based on the last screenshot it has something to do with the API response for the add interwiki dialogue, but not sure what the exact steps are that are followed there. It's easier to debug what API response is returned, when exactly every step in the add interwiki dialogue is mentioned. Based on the screenshot, the user is adding an interwiki link, this is the confirmation screen of the dialogue, but it's unclear how to get the api error (and that's partially due to the fact I can't read Georgian).

@Mehman97: Please see the last two comments and provide clear steps to reproduce. Thanks!

I can reproduce this error. Steps to reproduce follows:

  1. Log in to Wikidata.org and change your language to ka
  2. Ensure kawiki is in ka for you, if not, switch it into ka
  3. Go to any page that isn't connected to Wikidata item
  4. Click on "ბმულების დამატება"
  5. In displayed form, fill "enwiki" into first field
  6. Fill "ბმულების" into the second field
  7. See attachments for my output.

Screenshot from 2018-11-05 12-51-04.png (333×551 px, 57 KB)

Expected behaviour

It shouldn't definitely display any squares.

Definitely not a site request. Was anybody able to reproduce the bug using my steps to reproduce?

The trouble is that, after Unicode 11 added Georgian uppercase letters, the wiki software automatically capitalizes Georgian titles of articles, while editing something.
But in Georgian, we don't use capitalization (except all caps) and there are not any articles with capitalized titles in kawiki, so the software fails to select an existing article.

Confirmed same behavior, user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0

GeoUppercase.png (558×1 px, 77 KB)

Expected behaviour

It shouldn't capitalize the title.

Aklapper renamed this task from Problem with Georgian writing to Georgian words are automatically (incorrectly) capitalized when entered.Nov 28 2018, 5:10 PM
Aklapper changed the task status from Stalled to Open.
Aklapper added a project: I18n.

Dear Developers!

Please solve this problem as soon as possible, it is almost impossible to work with the problem in the Georgian Wikipedia. We cannot use the gadget Hotcat, because when entering the names of the categories, the first expression automatically becomes a quadrate and the category is not entered correctly.

ილჰან ომარი.png (1×1 px, 489 KB)

It is also not possible to bind Georgian articles to Wikidata through Wikipedia and we have to do it manually from Wikidata.

Addshore set the point value for this task to 5.EditedFeb 12 2019, 3:58 PM
Addshore subscribed.

My hunch is that this actually has to do with the API module that the widgety thing is calling.
wblinktitles I believe? Could be a good starting point... Or perhaps wbsetsitelink?
(TODO look at the call that is made)

I can reproduce the error. Just go to special unconnected pages in any article in kawiki and try to connect it to Wikidata using the widget.
For example in case https://ka.wikipedia.org/wiki/%E1%83%AF%E1%83%94%E1%83%9B%E1%83%98
The response I got was:

{"error":{"code":"no-external-page","info":"The external client site \"kawiki\" did not provide page information for page \"\u1caf\u10d4\u10db\u10d8\".","messages":[{"name":"wikibase-api-no-external-page","parameters":["kawiki","\u1caf\u10d4\u10db\u10d8"],"html":{"*":"The external client site \"kawiki\" did not provide page information for page \"\u1caf\u10d4\u10db\u10d8\"."}}],"*":"See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."},"servedby":"mw1316"}

The page title is parsed as \u1caf\u10d4\u10db\u10d8\ while encoding the title to utf-8 gives out \u10ef\u10d4\u10db\u10d8. I think it's pretty obvious what's wrong. But the params that is being sent is also malformed: {"labels":{"ka":{"language":"ka","value":"Ჯემი"},"en":{"language":"en","value":"Kenneth+Jackson+(sportsman)"}},"sitelinks":{"kawiki":{"site":"kawiki","title":"Ჯემი"},"enwiki":{"site":"enwiki","title":"Kenneth+Jackson+(sportsman)"}}}. I dig deeper soon.

Found it, caused by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/314725
To reproduce. Go to https://ka.wikipedia.org/wiki/%E1%83%AF%E1%83%94%E1%83%9B%E1%83%98 and do this:

( new mw.Title(mw.config.get( 'wgPageName' )) ).getPrefixedText()

(From https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/client/resources/wikibase.client.linkitem.init.js#L34)
I think we can just replace everything with mw.config.get( 'wgPageName' ) but this is one of those cases that if we touch, it would probably explode. I let @Esanders and @matmarex take a look first.

This is definitely not caused by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/314725. That patch actually fixed a similar issue to this, but with characters from other alphabets (T147646). It does nothing about Georgian scripts though.

The problem here is that web browsers have already been mostly updated to support Unicode 11, while the PHP running on Wikimedia servers has not. And like @Alan.H says above, Unicode 11 introduced a new set of Georgian characters (Mtavruli script) and assigned them as uppercase variants of the existing Georgian characters (Mkhedruli script).

Some relevant Wikipedia articles about the scripts, which clearly state that we should not be using Mtavruli characters for the first letter of titles written in Mkhedruli:

In Mkhedruli, there is no case. Sometimes, however, a capital-like effect, called Mtavruli, "title" or "heading", is achieved by modifying the letters so that their vertical sizes are identical and they rest on the baseline with no descenders. These capital-like letters are often used in page headings, chapter titles, monumental inscriptions, and the like.

Mtavruli letters were added in Unicode version 11.0 in June 2018.[86] (…)
Mtavruli is defined as the upper case, but not title case, of Mkhedruli (…)

JavaScript code not matching PHP code because of this is one problem, but another is that PHP will also eventually be updated, and then all of the page titles on Georgian Wikipedia are going to change to the incorrect form! We have time to fix this though: Unicode 11 support is in PHP 7.3 (https://3v4l.org/5CG8Z), while we're currently still working on upgrading to PHP 7.2.

Seems like we should probably make a plan to proactively manage Unicode transitions on our three platforms (browsers, PHP, server-side JS).

The plan in T219279 is write a compatibility mapping for PHP7 to behave like PHP5. We could add per-language exceptions to that list as well, which would trigger updates to mw.Title.js as that has a maintenance script that looks for differences between PHP and JS capitalisation.

The actual reason why it's broken is much simpler: the value of capitalizePageNames is true.

Fixing it also required changing something in the master HotCat code.

The actual reason why it's broken is much simpler: the value of capitalizePageNames is true.

Fixing it also required changing something in the master HotCat code.

Oh, now I understand that this bug is about a bunch of different bugs with a related reason.

The HotCat issue is probably fixed now, but adding links in Wikidata and some other things are distinct.

(Sorry, I was not actually working on this, I think I claimed it accidentally.)

I think that this is actually fixed, though. I can't reproduce the issue with the Wikidata dialog. Most problems with code handling page titles in JavaScript should have been fixed by rMW416895821fdb: Update phpCharToUpper.json based on current differences (September 2019).

Some more work is needed on the PHP side, but that is not causing issues right now (not until we upgrade PHP), and that is happening on T219279.