Page MenuHomePhabricator

cosmetic_changes bug on citation's number and punctuation
Open, LowPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1483/
Reported by: reza1615
Created on: 2012-07-02 10:01:55
Subject: cosmetic_changes bug on citation's number and punctuation
Original description:
class fixArabicLetters() changes Latin citation's number and punctuation (,) to Persian number and punctuation (,) and it is not correct please set it if the text around the number is in Latin do not convert numbers.

http://fa.wikipedia.org/w/index.php?title=%D8%A7%D8%B1%DB%8C%DA%A9\_%D8%AA%D8%B1%DB%8C%D9%86%DA%A9%D8%A7%D8%B3&diff=7277416&oldid=7277411

The Arabic digit fix was disabled with https://www.mediawiki.org/wiki/Special:Code/pywikipedia/10451 (071f768)
An additional related fix was https://www.mediawiki.org/wiki/Special:Code/pywikipedia/10788 (7c13ecbb2)

Details

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:25 AM
bzimport set Reference to bz55185.

in fa.wiki we have gadget that works fine it has function \(digits \(\) \) that convert numbers correctly my be it will useful for solving this bug

http://fa.wikipedia.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Gadget-Extra-Editbuttons-Functions.js

Is there any regularity for these citations e.g. "\\\(<en-fullmonthname> \d\{2\}, \d\{4\}\\\)"?

defining regularity for date or address in external urls is not simple

the best rule is when number is inside English or latin text it should be English Number and others that are in Farsi text should be convert to Farsi Numbers.

I would like to propose disabling this function for now to avoid unintended vandalism by bots. Later, we need to translate the JS code reza1615 mentioned into python and incorporate it in the bot.

fixArabicLetters\(\) disabled in r10451 for now

Change 246793 had a related patch set uploaded (by John Vandenberg):
Add CC -experimental to use disabled fixes

https://gerrit.wikimedia.org/r/246793

A page which can be used to show digits are still problematic is https://fa.wikipedia.org/wiki/.aq
If that is run in normal cc , no problems.
Enable -experimental -simulate with my patch, and it will try to convert the Latin digits in the dates to Farsi digits. To fix this, we need to convert the Latin based dates into other calendar dates.

If we enable a feature like -experimental, we should advise users that they can now try these features on their wiki, and provide feedback so we can learn how to improve the algorithms with the goal of the features becoming enabled by default.

Ladsgroup removed a project: User-Ladsgroup.

Change 478641 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [doc] Fix task number of T57185

https://gerrit.wikimedia.org/r/478641

Change 478641 merged by jenkins-bot:
[pywikibot/core@master] [doc] Fix task number of T57185

https://gerrit.wikimedia.org/r/478641

Deactivated code due to T57185

# valid digits
digits = {
    'ckb': ('٠١٢٣٤٥٦٧٨٩', 'fa'),
    'fa': ('۰۱۲۳۴۵۶۷۸۹', 'ckb'),
}

# For replacing old digits by new one
new, old_key = digits[self.site.code]
old = digits[old_key][0]

# FIXME: split this function into two.
# replace persian/arabic digits
# deactivated due to bug T57185
for i in range(0, 10):
    text = textlib.replaceExcept(text, old[i], new[i], exceptions)
# do not change digits in class, style and table params
pattern = re.compile(r'\w+=(".+?"|\d+)', re.UNICODE)
exceptions.append(pattern)
# do not change digits inside html-tags
pattern = re.compile('<[/]*?[^</]+?[/]*?>', re.UNICODE)
exceptions.append(pattern)
exceptions.append('table')  # exclude tables for now
# replace digits
for i in range(0, 10):
    text = textlib.replaceExcept(text, str(i), new[i], exceptions)
return text

Change 246793 abandoned by Xqt:
Add CC -experimental to use disabled fixes

Reason:
-experimental does not solve that issue

https://gerrit.wikimedia.org/r/246793

experimental patch has been abandoned