Page MenuHomePhabricator

Redirect in the search box for Arabic projects
Closed, ResolvedPublic

Description

This task is a split in 3 tasks of a major task T115561 .

Approval :
We voted in Arabic Wikipedia for the exclusive use of Western Arabic numerals (WAN) (0, 1, 2 ) instead of allowing the use of Eastern Arabic numerals (EAN) (٠‎ - ١‎ - ٢) when editing.
21 for / none against
https://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7:%D8%A7%D9%84%D9%85%D9%8A%D8%AF%D8%A7%D9%86/%D8%B3%D9%8A%D8%A7%D8%B3%D8%A7%D8%AA/10/2015

And the related policy have been updated
https://ar.wikipedia.org/w/index.php?title=%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7%3A%D8%AF%D9%84%D9%8A%D9%84_%D8%A7%D9%84%D8%A3%D8%B3%D9%84%D9%88%D8%A8&type=revision&diff=17231844&oldid=16952006

Task :
Actually, if someone make a search about an article that contains numerals and words but uses EAN, the article in WAN will not appear.
As an example : when we make a search for an indian movie called https://en.wikipedia.org/wiki/Love_Story_2050 in Arabic wikipedia with EAN, and may be the problem is in other sister projects, the article doesn't appear in the search results.
https://ar.wikipedia.org/w/index.php?search=%D9%82%D8%B5%D8%A9+%D8%AD%D8%A8+%D9%A2%D9%A0%D9%A5%D9%A0+%28%D9%81%D9%8A%D9%84%D9%85+%D9%87%D9%86%D8%AF%D9%8A%29&title=%D8%AE%D8%A7%D8%B5%3A%D8%A8%D8%AD%D8%AB&go=%D8%A7%D8%B0%D9%87%D8%A8

Even in the article about the movie really exist but with WAN.
https://ar.wikipedia.org/wiki/%D9%82%D8%B5%D8%A9_%D8%AD%D8%A8_2050_%28%D9%81%D9%8A%D9%84%D9%85_%D9%87%D9%86%D8%AF%D9%8A%29

Details

Related Gerrit Patches:

Event Timeline

Helmoony created this task.Oct 30 2015, 3:04 PM
Helmoony raised the priority of this task from to Needs Triage.
Helmoony updated the task description. (Show Details)
Helmoony added a subscriber: Helmoony.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 30 2015, 3:04 PM
Meno25 added a subscriber: Meno25.Oct 31 2015, 9:25 AM

something new ?

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 20 2015, 12:26 AM

something new ?

If there were news they were listed here. This task is in under "MediaWiki-Internationalization (Backlog)" and not prioritized. Anybody is welcome to provide patches to speed up fixing.

FShbib added a subscriber: FShbib.Mar 26 2016, 1:02 PM
Restricted Application added a subscriber: alanajjar. · View Herald TranscriptApr 23 2018, 1:00 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptJul 18 2018, 9:07 AM
Amire80 moved this task from Untriaged to Search on the I18n board.Jul 18 2018, 9:08 AM

Maybe @TJones has some insight?

I think the primary part of this is fixed—the example query works now—probably as the result of an update from Elasticsearch that included the automatic transformation. I checked my local installation with Arabic configured, and if I put in the text 2050 ٢٠٥٠, I get back two tokens: 2050 and 2050 again. Same for 1984 and ١٩٨٤.

Searching with quotes only matches the exact characters you typed, so searching for "قصة حب ٢٠٥٠" gets no results because it is looking for exactly that string. Searching without quotes works fine. I think that's probably the desired behavior, because it lets you readily find EAN characters in the text.

The only other place where this isn't happening is the completion suggester (which provides suggestions as you type in the search box). It's possible to equate the characters there, too, for all Arabic language wikis. (We've done something similar for Russian.) I can do that if it would be useful there, too.

Helmoony added a comment.EditedJul 26 2018, 7:03 PM

@TJones , yes please the completition suggester would be usefull in Arabic projects. Thank you.

What we did now is to create redirects by bot from eastern to western for articles related to years (https://ar.wikipedia.org/w/index.php?title=%D9%A2%D9%A0%D9%A5%D9%A0&redirect=no), but probably we need to do it for all articles with wan.

I'll add updating the completion suggester for Arabic to my list. I may not get to it for a while.

I don't think you will need to add more redirects. The internal representation of the numbers is WAN, for articles, titles, and queries. So searching for ٥5٥ finds 555, even without a redirect for "٥5٥".

Once the change is made for the completion suggester, ٥5٥ will match both 555 and ٥٥٥ for suggestions.

TJones claimed this task.Jul 26 2018, 7:19 PM
TJones triaged this task as Medium priority.Apr 26 2019, 5:29 PM
TJones updated the task description. (Show Details)May 6 2019, 5:03 PM

Change 510953 had a related patch set uploaded (by Tjones; owner: Tjones):
[mediawiki/extensions/CirrusSearch@master] Fold Eastern Arabic Numerals to Western in the Completion Suggester for Arabic

https://gerrit.wikimedia.org/r/510953

Change 510953 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Fold Eastern Arabic Numerals to Western in the Completion Suggester for Arabic

https://gerrit.wikimedia.org/r/510953

debt closed this task as Resolved.May 28 2019, 11:42 PM
Meno25 removed a subscriber: Meno25.May 31 2019, 11:49 AM