Page MenuHomePhabricator

Turkish letter replaced with Latin
Closed, ResolvedPublic

Description

In this edit https://en.wikipedia.org/w/index.php?title=Bosphorus_Bridge&diff=1031235567&oldid=1030392807, AWB's genfixes replaced

[[İstanbul Cup|Istanbul Cup]]

with

[[Istanbul Cup]]

It appears to have treated the Turkish letter İ as a typo.

I think that this happened a few hours before I upgraded from AWB version 6.1.0.0 to version 6.2.0.0, but I can't swear to that.

Event Timeline

It appears to have treated the Turkish letter İ as equivalent to the Latin letter I, and simplified the link per [[Wikipedia:AutoWikiBrowser/General_fixes#Simplify_links_(SimplifyLinks)]]

You can duplicate this with [[User:GoingBatty/sandbox]] using AWB 6.2.0.0 SVN 12469

The reason is that to correctly simplify both [[Dog|dog]] and [[dog|dog]] i.e. allowing that first letter of link may be upper or lower case, we turn first letter of both parts of link to lower case. That means that both İ and I become i so are considered the same, which is wrong.

rev 12500 SimplifyLinks: fix handling of diacritics on page's first letter

The reason is that to correctly simplify both [[Dog|dog]] and [[dog|dog]] i.e. allowing that first letter of link may be upper or lower case, we turn first letter of both parts of link to lower case. That means that both İ and I become i so are considered the same, which is wrong.

rev 12500 SimplifyLinks: fix handling of diacritics on page's first letter

I get the conflation of uppercase "I" and lowercase "i". That's correct, and that is a helpful fix: simplifies markup, and causes no change to the displayed text or the link target

But something is going wrong here, when the Turkish uppercase letter is converted to the unaccented Latin uppercase letter.

As far as I remember this problem was not only connected to AWB but it was more general. Even when I was searching for the Turkish letter, I was getting the Latin "I" too.

Rjwilmsi claimed this task.

The issue as reported has been fixed. @BrownHairedGirl you say "the Turkish uppercase letter is converted to the unaccented Latin uppercase letter." I agree, that was the result of the problem, it's now fixed.