Page MenuHomePhabricator

Incorrect text conversion in Kazakh Wikipedia.
Closed, InvalidPublicBUG REPORT

Description

The Kazakh Wikipedia uses a converter from Cyrillic to Latin and Arabic. But there is a problem: sometimes the text in Latin turns into Cyrillic. For example: Elizabeth II turns into Елізабетһ II as shown in the screenshot.

photo1668304320.jpeg (1×591 px, 78 KB)


Steps to replicate the issue (include links if applicable):

Current behavior (What happens?):

  • kk (unconverted): Elizabeth II
  • kk-cyrl (converted to Cyrillic): Елізабетһ II

Expected behavior (What should have happened instead?):

  • kk (unconverted): Elizabeth II
  • kk-cyrl (English texts shouldn't converted) : Elizabeth II

Event Timeline

I believe this is the intended behavior for LanguageConverter.

This issue should be fixed in the template:

Change the wikitext from

{{lang|{{{1}}}|{{{4}}}}}

to

{{lang|{{{1}}}|-{kk|{{{4}}}}-}}

should fix the issue.

in this example, it seems that the problem is in the template, but in fact this happens with simple texts that are outside the template.

For example you can see this screen, where all II. VII turns into ЫЫ ВЫЫ, and heir presumptive into һеір пресұмптіве,

photo5190843606922018613.jpg (1×591 px, 105 KB)

Oh yeah, there are some conversion issues with Roman numerals.

No, it’s not only Roman numerals, if you look clearly you can see English words too, which turned into Cyrillic one. I notice this error in the Wikipedia mobile app, as well as in the phone browser

I think you misunderstand how the conversion work.

The one you expected is kk (unconverted, "mix-script") instead of kk-Cyrl (converted to Cyrl) as the converter won't know whether each Latn words need to be converted or not respectively, so you need to specify the behavior on each "Latn - not to convert" words.

Yes, I know about that, but idk where i can change it

So the issue is there's no place to change the language variant to kk in Wikipedia mobile app, right?

LGoto triaged this task as Medium priority.Dec 5 2022, 5:28 PM
Dbrant subscribed.

It sounds like this might be a duplicate of T305383?
I noticed that the above task is closed, but the last comment from Anthony is that it's still reproducible.

So the issue is there's no place to change the language variant to kk in Wikipedia mobile app, right?

No, there is no such need. English text is automatically converted to Cyrillic, but it should not change at all.

It sounds like this might be a duplicate of T305383?
I noticed that the above task is closed, but the last comment from Anthony is that it's still reproducible.

Sounds like a different issue from this one.

English text is automatically converted to Cyrillic, but it should not change at all.

Based on your description:

Please deal with this issue by fixing https://kk.wikipedia.org/wiki/Template:Lang

Үлгі:Lang
<span lang="{{{1}}}" xml:lang="{{{1}}}">{{#ifeq:{{lc:{{{1}}}}}|kk|{{{2}}}|-{{{{2}}}}-}}</span><noinclude>{{Doc}}</noinclude>

Example with LanguageConverter -{ syntax: https://zh.wikipedia.org/wiki/Template:Lang?action=edit

As the template https://kk.wikipedia.org/wiki/Template:Lang has been cascaded, full-protected, I cannot directly fix the template.

By the way, there's another task about mark not-to-convert texts by detecting the lang= attribute:

T39617: Do not convert text marked as being in another language with a lang attribute

so the problem is not only in the template, but also in plain text, in the same article about Elizabeth II there is a text ''Tree Tops Hotel'' turns into a Cyrillic version.

so the problem is not only in the template, but also in plain text, in the same article about Elizabeth II there is a text ''Tree Tops Hotel'' turns into a Cyrillic version.

As the reply above: you need to manually "mark them as non-kk text" as the converter won't know if it is or is not.

Removing the app tags, as this is a bug/feature of the legacy language converter. This isn't a Content-Transform-Team bug until/unless we switch to Parsoid read views. Hopefully the Language team can take a look at this.

Things to follow up on:

  1. Template:Lang needs to be fixed on kkwiki to protect the argument from conversion, as @Winston_Sung wrote above.
  2. "Special casing roman numerals" is a bug/feature found in a number of Cyrillic converters. It could be turned off on kk wiki, which would force editors to use -{}- around legit roman numerals, but that's a separate discussion.
  3. @mukhamejan says that "Tree Tops Hotel" is also being affected. We need to look at that markup to see what's going on there; probably it also needs -{}- protection in the wikitext.

1 and (probably) 3 are content issues, not software issues. 2 might be a software fix. T39617: Do not convert text marked as being in another language with a lang attribute would also be a (future) software fix.

Another one screen with same issue

image.png (181×783 px, 14 KB)

Bugreporter subscribed.

Closed as invalid since the old Kazakh converter no longer exists.