Page MenuHomePhabricator

Update Arabic (ar) for fixes.py
Closed, ResolvedPublic

Description

Background

  • The Arabic language has a complex system of diacritics (Example images on Commons) that changes both the appearance and meaning of words.
  • Strictly speaking, every Arabic word should contain diacritics.
  • However, due to some technical limitations, (specifically, the lack of proper support of diacritics by widely used fonts and the poor handling of Arabic diacritics by search engines) most of Arabic websites don't (widely) use diacritics.
  • On Arabic Wikipedia (arwiki), most articles don't contain diacritics. However, many articles do. This is particarly true for featured and good articles on arwiki. (As those articles undergo thorough review, diacritics are usually added to them to ensure they are correct from the linguist point of view.)

fixes.py

  • Former arwiki admin Alnokta (currently inactive) is the original author of a (non-regex based) replacement dictionary that was added to Pywikibot file fixes.py ("correct-ar" line 446, see for example r4726)
  • The idea of the code is that we (arwiki bot operators) are going to use Pywikibot to make automatic typographic corrections using a predefined list of typographic errors. (Meaning that we will use the bot to replace words that we know are wrong only.) This ensures that the code will be 100% accurate when running in bot mode.

Problem

  • In r5942 the code was changed to use regex.
  • The problem here is that the current code uses \b (word boundary) and treats every Arabic diacritic as a word boundary. (Meaning that if an Arabic word contains diacritics, the code will treat it as two words and will apply the regex-based corrections on each word separately.)
  • Now this introduces a huge number of errors when running the code. (A test run of 90 articles resulted in 4 mistakes for an error percentage of 4/90 = 4.4% which is of course not acceptable for bot operation.)
  • The current (regex-based) code is useless: It can't be run in bot mode as it introduces many errors and of course we (in arwiki) don't have enough man power to run the code in human assisted mode on the ~630k articles that we have.

Solution

  • OK. Now we have 2 solutions here. Either to:
    1. Modify the Pywikibot regex handling to take Arabic diacritics into account (which I doubt that a non-native Arabic speaker will be able to implement correctly, as to implement such feature you need to understand the semantics of the language.) OR to
    2. Make a partial revert of r5942 (which is the solution that I prefer here.)

Request

My request here is to apply this clean patch (against Pywikibot core)

1diff --git a/pywikibot/fixes.py b/pywikibot/fixes.py
2index 5922b06..ffd2ab0 100644
3--- a/pywikibot/fixes.py
4+++ b/pywikibot/fixes.py
5@@ -28,7 +28,7 @@ parameter_help = """
6 in German
7 * music - Links auf Begriffsklärungen in German
8 * datum - specific date formats in German
9- * correct-ar - Corrections for Arabic Wikipedia and any
10+ * correct-ar - Typo corrections for Arabic Wikipedia and any
11 Arabic wiki.
12 * yu-tld - Fix links to .yu domains because it is
13 disabled, see:
14@@ -440,8 +440,8 @@ fixes = {
15 }
16 },
17
18- # Corrections for Arabic Wikipedia and any Arabic wiki.
19- # python pwb.py replace -fix:correct-ar -start:! -always
20+ # Typo corrections for Arabic Wikipedia and any Arabic wiki.
21+ # python pwb.py replace -fix:correct-ar -start:! -always
22
23 'correct-ar': {
24 'regex': True,
25@@ -452,110 +452,138 @@ fixes = {
26 # FIXME: Do not replace comma in non-Arabic text,
27 # interwiki, image links or <math> syntax.
28 # (u' ,', u' ،'),
29- # TODO: Basic explanation in English what it does
30- (r'\bإمرأة\b', 'امرأة'),
31- (r'\bالى\b', 'إلى'),
32- (r'\bإسم\b', 'اسم'),
33- (r'\bالأن\b', 'الآن'),
34- (r'\bالة\b', 'آلة'),
35- (r'\bفى\b', 'في'),
36- (r'\bإبن\b', 'ابن'),
37- (r'\bإبنة\b', 'ابنة'),
38- (r'\bإقتصاد\b', 'اقتصاد'),
39- (r'\bإجتماع\b', 'اجتماع'),
40- (r'\bانجيل\b', 'إنجيل'),
41- (r'\bاجماع\b', 'إجماع'),
42- (r'\bاكتوبر\b', 'أكتوبر'),
43- (r'\bإستخراج\b', 'استخراج'),
44- (r'\bإستعمال\b', 'استعمال'),
45- (r'\bإستبدال\b', 'استبدال'),
46- (r'\bإشتراك\b', 'اشتراك'),
47- (r'\bإستعادة\b', 'استعادة'),
48- (r'\bإستقلال\b', 'استقلال'),
49- (r'\bإنتقال\b', 'انتقال'),
50- (r'\bإتحاد\b', 'اتحاد'),
51- (r'\bاملاء\b', 'إملاء'),
52- (r'\bإستخدام\b', 'استخدام'),
53- (r'\bأحدى\b', 'إحدى'),
54- (r'\bلاكن\b', 'لكن'),
55- (r'\bإثنان\b', 'اثنان'),
56- (r'\bإحتياط\b', 'احتياط'),
57- (r'\bإقتباس\b', 'اقتباس'),
58- (r'\bادارة\b', 'إدارة'),
59- (r'\bابناء\b', 'أبناء'),
60- (r'\bالانصار\b', 'الأنصار'),
61- (r'\bاشارة\b', 'إشارة'),
62- (r'\bإقرأ\b', 'اقرأ'),
63- (r'\bإمتياز\b', 'امتياز'),
64- (r'\bارق\b', 'أرق'),
65- (r'\bاللة\b', 'الله'),
66- (r'\bإختبار\b', 'اختبار'),
67- (r'== ?روابط خارجية ?==', '== وصلات خارجية =='),
68- (r'\bارسال\b', 'إرسال'),
69- (r'\bإتصالات\b', 'اتصالات'),
70- (r'\bابو\b', 'أبو'),
71- (r'\bابا\b', 'أبا'),
72- (r'\bاخو\b', 'أخو'),
73- (r'\bاخا\b', 'أخا'),
74- (r'\bاخي\b', 'أخي'),
75- (r'\bاحد\b', 'أحد'),
76- (r'\bاربعاء\b', 'أربعاء'),
77- (r'\bاول\b', 'أول'),
78- (r'\b(ال|)اهم\b', r'\1أهم'),
79- (r'\b(ال|)اثقل\b', r'\1أثقل'),
80- (r'\b(ال|)امجد\b', r'\1أمجد'),
81- (r'\b(ال|)اوسط\b', r'\1أوسط'),
82- (r'\b(ال|)اشقر\b', r'\1أشقر'),
83- (r'\b(ال|)انور\b', r'\1أنور'),
84- (r'\b(ال|)اصعب\b', r'\1أصعب'),
85- (r'\b(ال|)اسهل\b', r'\1أسهل'),
86- (r'\b(ال|)اجمل\b', r'\1أجمل'),
87- (r'\b(ال|)اقبح\b', r'\1أقبح'),
88- (r'\b(ال|)اطول\b', r'\1أطول'),
89- (r'\b(ال|)اقصر\b', r'\1أقصر'),
90- (r'\b(ال|)اسمن\b', r'\1أسمن'),
91- (r'\b(ال|)اذكى\b', r'\1أذكى'),
92- (r'\b(ال|)اكثر\b', r'\1أكثر'),
93- (r'\b(ال|)افضل\b', r'\1أفضل'),
94- (r'\b(ال|)اكبر\b', r'\1أكبر'),
95- (r'\b(ال|)اشهر\b', r'\1أشهر'),
96- (r'\b(ال|)ابطأ\b', r'\1أبطأ'),
97- (r'\b(ال|)اماني\b', r'\1أماني'),
98- (r'\b(ال|)احلام\b', r'\1أحلام'),
99- (r'\b(ال|)اسماء\b', r'\1أسماء'),
100- (r'\b(ال|)اسامة\b', r'\1أسامة'),
101- (r'\bابراهيم\b', 'إبراهيم'),
102- (r'\bاسماعيل\b', 'إسماعيل'),
103- (r'\bايوب\b', 'أيوب'),
104- (r'\bايمن\b', 'أيمن'),
105- (r'\bاوزبكستان\b', 'أوزبكستان'),
106- (r'\bاذربيجان\b', 'أذربيجان'),
107- (r'\bافغانستان\b', 'أفغانستان'),
108- (r'\bانجلترا\b', 'إنجلترا'),
109- (r'\bايطاليا\b', 'إيطاليا'),
110- (r'\bاوربا\b', 'أوروبا'),
111- (r'\bأوربا\b', 'أوروبا'),
112- (r'\bاوغندة\b', 'أوغندة'),
113- (r'\b(ال|)ا(لماني|فريقي|سترالي)(ا|ة|تان|ان|ين|ي|ون|و|ات|)\b',
114- r'\1أ\2\3'),
115- (r'\b(ال|)ا(وروب|مريك)(ا|ي|ية|يتان|يان|يين|يي|يون|يو|يات|)\b',
116- r'\1أ\2\3'),
117- (r'\b(ال|)ا(ردن|رجنتين|وغند|سبان|وكران|فغان)'
118- r'(ي|ية|يتان|يان|يين|يي|يون|يو|يات|)\b',
119- r'\1أ\2\3'),
120- (r'\b(ال|)ا(سرائيل|يران|مارات|نكليز|نجليز)'
121- r'(ي|ية|يتان|يان|يين|يي|يون|يو|يات|)\b',
122- r'\1إ\2\3'),
123- (r'\b(ال|)(ا|أ)(رثوذكس|رثوذوكس)(ي|ية|يتان|يان|يين|يي|يون|يو|يات|)'
124- r'\b',
125- r'\1أرثوذكس\4'),
126- (r'\bإست(عمل|خدم|مر|مد|مال|عاض|قام|حال|جاب|قال|زاد|عان|طال)'
127- r'(ت|ا|وا|)\b',
128- r'است\1\2'),
129- (r'\bإست(حال|قال|طال|زاد|عان|قام|راح|جاب|عاض|مال)ة\b', r'است\1ة'),
130+ (r'(\A|\s)إمرأة(\Z|\s)', '\\1امرأة\\2'),
131+ (r'(\A|\s)الى(\Z|\s)', '\\1إلى\\2'),
132+ (r'(\A|\s)إسم(\Z|\s)', '\\1اسم\\2'),
133+ (r'(\A|\s)الأن(\Z|\s)', '\\1الآن\\2'),
134+ (r'(\A|\s)اول(\Z|\s)', '\\1أول\\2'),
135+ (r'(\A|\s)الة(\Z|\s)', '\\1آلة\\2'),
136+ (r'(\A|\s)فى(\Z|\s)', '\\1في\\2'),
137+ (r'(\A|\s)اثقل(\Z|\s)', '\\1أثقل\\2'),
138+ (r'(\A|\s)إبن(\Z|\s)', '\\1ابن\\2'),
139+ (r'(\A|\s)إبنة(\Z|\s)', '\\1ابنة\\2'),
140+ (r'(\A|\s)إقتصاد(\Z|\s)', '\\1اقتصاد\\2'),
141+ (r'(\A|\s)إجتماع(\Z|\s)', '\\1اجتماع\\2'),
142+ (r'(\A|\s)انجيل(\Z|\s)', '\\1إنجيل\\2'),
143+ (r'(\A|\s)اجماع(\Z|\s)', '\\1إجماع\\2'),
144+ (r'(\A|\s)امريكا(\Z|\s)', '\\1أمريكا\\2'),
145+ (r'(\A|\s)اوروبا(\Z|\s)', '\\1أوروبا\\2'),
146+ (r'(\A|\s)انجلترا(\Z|\s)', '\\1إنجلترا\\2'),
147+ (r'(\A|\s)اكتوبر(\Z|\s)', '\\1أكتوبر\\2'),
148+ (r'(\A|\s)اسرائيل(\Z|\s)', '\\1إسرائيل\\2'),
149+ (r'(\A|\s)المانيا(\Z|\s)', '\\1ألمانيا\\2'),
150+ (r'(\A|\s)ايطاليا(\Z|\s)', '\\1إيطاليا\\2'),
151+ (r'(\A|\s)ايران(\Z|\s)', '\\1إيران\\2'),
152+ (r'(\A|\s)إستخراج(\Z|\s)', '\\1استخراج\\2'),
153+ (r'(\A|\s)إستعمال(\Z|\s)', '\\1استعمال\\2'),
154+ (r'(\A|\s)إستبدال(\Z|\s)', '\\1استبدال\\2'),
155+ (r'(\A|\s)إشتراك(\Z|\s)', '\\1اشتراك\\2'),
156+ (r'(\A|\s)إستعادة(\Z|\s)', '\\1استعادة\\2'),
157+ (r'(\A|\s)إستقلال(\Z|\s)', '\\1استقلال\\2'),
158+ (r'(\A|\s)إنتقال(\Z|\s)', '\\1انتقال\\2'),
159+ (r'(\A|\s)إتحاد(\Z|\s)', '\\1اتحاد\\2'),
160+ (r'(\A|\s)املاء(\Z|\s)', '\\1إملاء\\2'),
161+ (r'(\A|\s)إستخدام(\Z|\s)', '\\1استخدام\\2'),
162+ (r'(\A|\s)أحدى(\Z|\s)', '\\1إحدى\\2'),
163+ (r'(\A|\s)لاكن(\Z|\s)', '\\1لكن\\2'),
164+ (r'(\A|\s)الاردن(\Z|\s)', '\\1الأردن\\2'),
165+ (r'(\A|\s)إثنان(\Z|\s)', '\\1اثنان\\2'),
166+ (r'(\A|\s)شيئ(\Z|\s)', '\\1شيء\\2'),
167+ (r'(\A|\s)إحتياط(\Z|\s)', '\\1احتياط\\2'),
168+ (r'(\A|\s)إقتباس(\Z|\s)', '\\1اقتباس\\2'),
169+ (r'(\A|\s)الامارات(\Z|\s)', '\\1الإمارات\\2'),
170+ (r'(\A|\s)اكثر(\Z|\s)', '\\1أكثر\\2'),
171+ (r'(\A|\s)افضل(\Z|\s)', '\\1أفضل\\2'),
172+ (r'(\A|\s)اكبر(\Z|\s)', '\\1أكبر\\2'),
173+ (r'(\A|\s)اشهر(\Z|\s)', '\\1أشهر\\2'),
174+ (r'(\A|\s)ادارة(\Z|\s)', '\\1إدارة\\2'),
175+ (r'(\A|\s)ابناء(\Z|\s)', '\\1أبناء\\2'),
176+ (r'(\A|\s)الانصار(\Z|\s)', '\\1 الأنصار\\2'),
177+ (r'(\A|\s)اشارة(\Z|\s)', '\\1إشارة\\2'),
178+ (r'(\A|\s)إقرأ(\Z|\s)', '\\1اقرأ\\2'),
179+ (r'(\A|\s)إمتياز(\Z|\s)', '\\1امتياز\\2'),
180+ (r'(\A|\s)ارق(\Z|\s)', '\\1أرق\\2'),
181+ (r'(\A|\s)أرثوذوكس(\Z|\s)', '\\1أرثوذكس\\2'),
182+ (r'(\A|\s)الأرثوذوكس(\Z|\s)', '\\1الأرثوذكس\\2'),
183+ (r'(\A|\s)أرثوذوكسية(\Z|\s)', '\\1أرثوذكسية\\2'),
184+ (r'(\A|\s)الأرثوذوكسية(\Z|\s)', '\\1الأرثوذكسية\\2'),
185+ (r'(\A|\s)الأرثوذوكسي(\Z|\s)', '\\1الأرثوذكسي\\2'),
186+ (r'(\A|\s)ارثوذوكس(\Z|\s)', '\\1أرثوذكس\\2'),
187+ (r'(\A|\s)ارثوذوكسي(\Z|\s)', '\\1أرثوذكسي\\2'),
188+ (r'(\A|\s)ارثوذوكسية(\Z|\s)', '\\1أرثوذكسية\\2'),
189+ (r'(\A|\s)الارثوذوكسية(\Z|\s)', '\\1الأرثوذكسية\\2'),
190+ (r'(\A|\s)اللة(\Z|\s)', '\\1الله\\2'),
191+ (r'(\A|\s)إختبار(\Z|\s)', '\\1اختبار\\2'),
192+ (r'(\A|\s)== روابط خارجية ==(\Z|\s)', '\\1== وصلات خارجية ==\\2'),
193+ (r'(\A|\s)==روابط خارجية==(\Z|\s)', '\\1== وصلات خارجية ==\\2'),
194+ (r'(\A|\s)ارسال(\Z|\s)', '\\1إرسال\\2'),
195+ (r'(\A|\s)إتصالات(\Z|\s)', '\\1اتصالات\\2'),
196+ (r'(\A|\s)اسامة(\Z|\s)', '\\1أسامة\\2'),
197+ (r'(\A|\s)ابراهيم(\Z|\s)', '\\1إبراهيم\\2'),
198+ (r'(\A|\s)اسماعيل(\Z|\s)', '\\1إسماعيل\\2'),
199+ (r'(\A|\s)ايوب(\Z|\s)', '\\1أيوب\\2'),
200+ (r'(\A|\s)ايمن(\Z|\s)', '\\1أيمن\\2'),
201+ (r'(\A|\s)ابو(\Z|\s)', '\\1أبو\\2'),
202+ (r'(\A|\s)ابا(\Z|\s)', '\\1أبا\\2'),
203+ (r'(\A|\s)اخو(\Z|\s)', '\\1أخو\\2'),
204+ (r'(\A|\s)اخا(\Z|\s)', '\\1أخا\\2'),
205+ (r'(\A|\s)اخي(\Z|\s)', '\\1أخي\\2'),
206+ (r'(\A|\s)احد(\Z|\s)', '\\1أحد\\2'),
207+ (r'(\A|\s)اربعاء(\Z|\s)', '\\1أربعاء\\2'),
208+ (r'(\A|\s)اهم(\Z|\s)', '\\1أهم\\2'),
209+ (r'(\A|\s)اوزبكستان(\Z|\s)', '\\1أوزبكستان\\2'),
210+ (r'(\A|\s)اذربيجان(\Z|\s)', '\\1أذربيجان\\2'),
211+ (r'(\A|\s)افغانستان(\Z|\s)', '\\1أفغانستان\\2'),
212+ (r'(\A|\s)امجد(\Z|\s)', '\\1أمجد\\2'),
213+ (r'(\A|\s)اوسط(\Z|\s)', '\\1أوسط\\2'),
214+ (r'(\A|\s)اشقر(\Z|\s)', '\\1أشقر\\2'),
215+ (r'(\A|\s)انور(\Z|\s)', '\\1أنور\\2'),
216+ (r'(\A|\s)اصعب(\Z|\s)', '\\1أصعب\\2'),
217+ (r'(\A|\s)اسهل(\Z|\s)', '\\1أسهل\\2'),
218+ (r'(\A|\s)اجمل(\Z|\s)', '\\1أجمل\\2'),
219+ (r'(\A|\s)اقبح(\Z|\s)', '\\1أقبح\\2'),
220+ (r'(\A|\s)اطول(\Z|\s)', '\\1أطول\\2'),
221+ (r'(\A|\s)اقصر(\Z|\s)', '\\1أقصر\\2'),
222+ (r'(\A|\s)اسمن(\Z|\s)', '\\1أسمن\\2'),
223+ (r'(\A|\s)اذكى(\Z|\s)', '\\1أذكى\\2'),
224+ (r'(\A|\s)اماني(\Z|\s)', '\\1أماني\\2'),
225+ (r'(\A|\s)احلام(\Z|\s)', '\\1أحلام\\2'),
226+ (r'(\A|\s)اسماء(\Z|\s)', '\\1أسماء\\2'),
227+ (r'(\A|\s)ابطأ(\Z|\s)', '\\1أبطأ\\2'),
228+ (r'(\A|\s)اوربا(\Z|\s)', '\\1أوروبا\\2'),
229+ (r'(\A|\s)أوربا(\Z|\s)', '\\1أوروبا\\2'),
230+ (r'(\A|\s)امريكي(\Z|\s)', '\\1أمريكي\\2'),
231+ (r'(\A|\s)امريكية(\Z|\s)', '\\1أمريكية\\2'),
232+ (r'(\A|\s)امريكيان(\Z|\s)', '\\1أمريكيان\\2'),
233+ (r'(\A|\s)امريكيتان(\Z|\s)', '\\1أمريكيتان\\2'),
234+ (r'(\A|\s)امريكيون(\Z|\s)', '\\1أمريكيون\\2'),
235+ (r'(\A|\s)امريكيات(\Z|\s)', '\\1أمريكيات\\2'),
236+ (r'(\A|\s)الامريكي(\Z|\s)', '\\1الأمريكي\\2'),
237+ (r'(\A|\s)الامريكية(\Z|\s)', '\\1الأمريكية\\2'),
238+ (r'(\A|\s)الامريكيان(\Z|\s)', '\\1الأمريكيان\\2'),
239+ (r'(\A|\s)الامريكيتان(\Z|\s)', '\\1الأمريكيتان\\2'),
240+ (r'(\A|\s)الامريكيون(\Z|\s)', '\\1الأمريكيون\\2'),
241+ (r'(\A|\s)الامريكيات(\Z|\s)', '\\1الأمريكيات\\2'),
242+ (r'(\A|\s)اوروبي(\Z|\s)', '\\1أوروبي\\2'),
243+ (r'(\A|\s)اوروبية(\Z|\s)', '\\1أوروبية\\2'),
244+ (r'(\A|\s)اوروبيان(\Z|\s)', '\\1أوروبيان\\2'),
245+ (r'(\A|\s)اوروبيتان(\Z|\s)', '\\1أوروبيتان\\2'),
246+ (r'(\A|\s)اوروبيون(\Z|\s)', '\\1أوروبيون\\2'),
247+ (r'(\A|\s)اوروبيات(\Z|\s)', '\\1أوروبيات\\2'),
248+ (r'(\A|\s)الاوروبي(\Z|\s)', '\\1الأوروبي\\2'),
249+ (r'(\A|\s)الاوروبية(\Z|\s)', '\\1الأوروبية\\2'),
250+ (r'(\A|\s)الاوروبيان(\Z|\s)', '\\1الأوروبيان\\2'),
251+ (r'(\A|\s)الاوروبيتان(\Z|\s)', '\\1الأوروبيتان\\2'),
252+ (r'(\A|\s)الاوروبيون(\Z|\s)', '\\1الأوروبيون\\2'),
253+ (r'(\A|\s)الاوروبيات(\Z|\s)', '\\1الأوروبيات\\2'),
254+ (r'(\A|\s)اسرائيلي(\Z|\s)', '\\1إسرائيلي\\2'),
255+ (r'(\A|\s)اسرائيلية(\Z|\s)', '\\1إسرائيلية\\2'),
256+ (r'(\A|\s)اسرائيليان(\Z|\s)', '\\1إسرائيليان\\2'),
257+ (r'(\A|\s)اسرائيليتان(\Z|\s)', '\\1إسرائيليتان\\2'),
258 ],
259 'exceptions': {
260 'inside-tags': [
261+ 'gallery', # because of filenames
262 'interwiki',
263 'math',
264 'ref',
265@@ -566,6 +594,7 @@ fixes = {
266 'specialpages': {
267 'regex': False,
268 'msg': {
269+ 'ar': 'روبوت: إصلاح حالة حروف الصفحات الخاصة',
270 'en': 'Robot: Fixing special page capitalisation',
271 'fa': 'ربات: تصحیح بزرگی و کوچکی حروف صفحه‌های ویژه',
272 },
273@@ -597,6 +626,7 @@ fixes = {
274 'regex': False,
275 'nocase': True,
276 'msg': {
277+ 'ar': 'روبوت: إصلاح الوصلات إلى نطاقات .yu',
278 'de': 'Bot: Ersetze Links auf .yu-Domains',
279 'en': 'Robot: Replacing links to .yu domains',
280 'fa': 'ربات: جایگزینی پیوندها به دامنه‌ها با پسوند yu',

The patch does the following:

  1. Partial revert of r5942
  2. Adds "gallery" to "exceptions" (because of file names).
  3. Adds some Arabic (ar) translation

(By the way, if you are wondering why it took me so long to report this issue, this is because I myself was inactive as a bot operator on arwiki. But recently I decided to resume operating the bot again, so, I need this issue fixed.)

Thank you

Related Objects

Event Timeline

Xqt triaged this task as Medium priority.Dec 10 2018, 12:30 PM

@Meno25: could you tag some samples where the current implementation fails. I think the path made by @OsamaK was intended to ensure to replace single words but not word fragments.

Could you check whether \s solves this instead of that \b?

About the patch itself: now pywikibot uses Gerrit. Would it be possible to upload your patch on it? See https://www.mediawiki.org/wiki/Gerrit for general information. As a last option you can use https://tools.wmflabs.org/gerrit-patch-uploader/ tool.

Patch isn’t in gerrit for review now

Change 481256 had a related patch set uploaded (by Gerrit Patch Uploader; owner: Zoranzoki21):
[pywikibot/core@master] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481256

Ok, I made https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/481256/ per P7897. (Just to test Gerrit Patch Uploader and to resolve this :D)

There are a few problems with this patch, @Meno25 please update the patch

Framawiki renamed this task from Update Arabic (ar) for fixes.py (Patch against Pywikibot core included) to Update Arabic (ar) for fixes.py.Dec 23 2018, 1:03 PM

There are a few problems with this patch, @Meno25 please update the patch

I fixed it.

Meno25 changed the task status from Open to Stalled.Dec 23 2018, 3:54 PM

@Meno25: could you tag some samples where the current implementation fails. I think the path made by @OsamaK was intended to ensure to replace single words but not word fragments.

Could you check whether \s solves this instead of that \b?

Sure. I am going to try using \s and see if it will work. Let's mark this as stalled for now.

Sure. I am going to try using \s and see if it will work. Let's mark this as stalled for now.

I made another proposal here. Could you check it?

Sure. I am going to try using \s and see if it will work. Let's mark this as stalled for now.

I made another proposal here. Could you check it?

Sure. Just please give me some time to test it as I am busy right now with other stuff in real life.

Change 481284 had a related patch set uploaded (by Meno25; owner: Meno25):
[pywikibot/core@master] [i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481284

Meno25 changed the task status from Stalled to Open.EditedDec 24 2018, 6:38 AM

Sure. I am going to try using \s and see if it will work. Let's mark this as stalled for now.

I made another proposal here. Could you check it?

Sure. Just please give me some time to test it as I am busy right now with other stuff in real life.

And @Xqt saves the day!

I ran the new code as suggested by Xqt on all the featured and good articles and featured lists on arwiki (a total of 568+593+159=1320 pages) and all is OK.

I uploaded a new patch to Gerrit https://gerrit.wikimedia.org/r/#/c/481284/
based on the earlier patch https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/481256/
Please add the new patch (but note that I am new to Gerrit, so, please review the patch carefully before adding it).
I also updated the patch in P7897 with the new Xqt code.
Thank you.

Change 481256 abandoned by Xqt:
[i18n] Update Arabic (ar) at fixes.py

Reason:
See new ps

https://gerrit.wikimedia.org/r/481256

Change 481314 had a related patch set uploaded (by Meno25; owner: Meno25):
[pywikibot@refs/meta/config] [i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481314

Change 481314 abandoned by Meno25:
[i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481314

Change 481315 had a related patch set uploaded (by Meno25; owner: Meno25):
[pywikibot/core@master] [i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481315

Change 481284 abandoned by Meno25:
[i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481284

Change 481315 merged by jenkins-bot:
[pywikibot/core@master] [i18n] Update Arabic (ar) at fixes.py

https://gerrit.wikimedia.org/r/481315