Page MenuHomePhabricator

'ஆாி' and 'ஆரி' are not treated the same anymore in Tamil Wikipedia, leading to duplicated articles
Closed, DeclinedPublic

Description

Recently in July, there are multiple people started editing Tamil Wiki Pages, we observe there is an issue with the letter 'Ra series' like 'ஆாி' and 'ஆரி' technically both are same but wiki treats it differently and creates multiple pages. Can some one fix this as soon as possible otherwise it's going to be a lot of cleanup work in future. It's not just for a single character but also for the entire series.

Check these below links. Both are referring to the same title; however due to the fonts, wiki creates two different pages. Kindly fix this and let me know if you need further detail on the same.

https://ta.wikipedia.org/s/6jos
https://ta.wikipedia.org/s/6j4e

Event Timeline

Aklapper renamed this task from Font issues in Ta.Wiki, creates multiple issues. to 'ஆாி' and 'ஆரி' are not treated the same anymore in Tamil Wikipedia, leading to duplicated articles.Jul 22 2017, 10:10 AM

Hi @Dineshkumar, what makes you sure that this is a problem with Wikipedia itself, instead of for example the used browser or operating system?

Hi @Dineshkumar, what makes you sure that this is a problem with Wikipedia itself, instead of for example the used browser or operating system?

Am damn sure it is not the problem with OS or Browsers. In Tamil generally while writing few people follow the first one and rest of the people follow the second one. Till last month (if I remember correct) we used only the second one. It's creates so much confusion and duplicate articles. I think the translated ones got different fonts. But we should fix it to use only one character not the both. Thanks.

Note: I have also tested this to confirm on Windows, Mac OS, and Linux with Chrome, Safari, and Firefox browsers.

Aklapper added subscribers: Amire80, santhosh.

Thanks for clarifying!
@Amire80, @santhosh: Any ideas how to track this down? :-/

Thanks for asking, @Aklapper.

@santhosh can probably give a much better answer, but I suspect that the title "இந்திய ஆாியா்கள்" (the first) is invalid in terms of language and shouldn't be created at all. The pages that were created should be merged into valid pages and then deleted or converted into redirects.

Most of all, I suspect that this issue may have something to do with the keyboard that the editor is using. Does it happen only with particular users or with a large group of different users? Does it happen to experienced editors?

I'd try asking the users who are making such edits what kind of operating system are they using and what keyboard are they using to type the Tamil text—the operating system's keyboard, the keyboard that is provided using ULS and jquery.ime (the little keyboard icon that appears next to the search and editing boxes), or something else (for example, I know that some people find it comfortable to type Tamil text in Google Translate, and then copy the text to Wikipedia).

If the invalid titles were created by just one user or a small number of users, I suggest explaining them how to use the keyboard correctly, and documenting at a persistent help page somewhere.

Another thing I should mention is that in early June I merged a change to the ta-99 jquery.ime input method submitted by User:Balajijagadesh. It may or may not have something to do with this issue.

(I might be wrong about all of the above! Please don't make decisions based only on what I say unless you're really sure about what you're doing.)

I'm also adding @Ravidreams, who may help understanding what's going on.

ஆாி is Vowel AA, Vowel Sign AA and Vowel sign I, and ofcourse it is a nonsense word using the look alike character Vowel sign AA ா for Consonant RA ர. Agree with @Amire80 that this might be because of buggy nature of any keyboard used by presumably smaller set of users. It is also possible that mobile users who use visual keyboard for touch - type use ா instead of ர. I can't think of any technical solution to address this. Educating editors who make this mistake is the best way.

I think this is caused by the Bamini keyboard provided by jquery.me

Will explore on this and share the results.

In bamini layout, it gives the following results.

tamil word - key combination
பரி - gup
பாி - ghp

Have to check that if original bamini layout and bamini in jquery.ime are similar.

The cause may be the issue on jquery.ime layout or people confusing with similar letters ா and ர

I think this is caused by the Bamini keyboard provided by jquery.me

If this turns out to be a jquery.ime issues, see https://www.mediawiki.org/wiki/Upstream_projects#Invented_Here for links where to report the problem. In this case: https://github.com/wikimedia/jquery.ime/issues

I can't think of any technical solution to address this.

Setting tasks status to declined.