Page MenuHomePhabricator

Redirect in the search box for Arabic projects
Closed, ResolvedPublic

Description

This task is a split in 3 tasks of a major task T115561 .

Approval :
We voted in Arabic Wikipedia for the exclusive use of Western Arabic numerals (WAN) (0, 1, 2 ) instead of allowing the use of Eastern Arabic numerals (EAN) (٠‎ - ١‎ - ٢) when editing.
21 for / none against
https://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7:%D8%A7%D9%84%D9%85%D9%8A%D8%AF%D8%A7%D9%86/%D8%B3%D9%8A%D8%A7%D8%B3%D8%A7%D8%AA/10/2015

And the related policy have been updated
https://ar.wikipedia.org/w/index.php?title=%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7%3A%D8%AF%D9%84%D9%8A%D9%84_%D8%A7%D9%84%D8%A3%D8%B3%D9%84%D9%88%D8%A8&type=revision&diff=17231844&oldid=16952006

Task :
Actually, if someone make a search about an article that contains numerals and words but uses EAN, the article in WAN will not appear.
As an example : when we make a search for an indian movie called https://en.wikipedia.org/wiki/Love_Story_2050 in Arabic wikipedia with EAN, and may be the problem is in other sister projects, the article doesn't appear in the search results.
https://ar.wikipedia.org/w/index.php?search=%D9%82%D8%B5%D8%A9+%D8%AD%D8%A8+%D9%A2%D9%A0%D9%A5%D9%A0+%28%D9%81%D9%8A%D9%84%D9%85+%D9%87%D9%86%D8%AF%D9%8A%29&title=%D8%AE%D8%A7%D8%B5%3A%D8%A8%D8%AD%D8%AB&go=%D8%A7%D8%B0%D9%87%D8%A8

Even in the article about the movie really exist but with WAN.
https://ar.wikipedia.org/wiki/%D9%82%D8%B5%D8%A9_%D8%AD%D8%A8_2050_%28%D9%81%D9%8A%D9%84%D9%85_%D9%87%D9%86%D8%AF%D9%8A%29

Event Timeline

Helmoony raised the priority of this task from to Needs Triage.
Helmoony updated the task description. (Show Details)
Helmoony added a subscriber: Helmoony.

something new ?

If there were news they were listed here. This task is in under "MediaWiki-Internationalization (Backlog)" and not prioritized. Anybody is welcome to provide patches to speed up fixing.

I think the primary part of this is fixed—the example query works now—probably as the result of an update from Elasticsearch that included the automatic transformation. I checked my local installation with Arabic configured, and if I put in the text 2050 ٢٠٥٠, I get back two tokens: 2050 and 2050 again. Same for 1984 and ١٩٨٤.

Searching with quotes only matches the exact characters you typed, so searching for "قصة حب ٢٠٥٠" gets no results because it is looking for exactly that string. Searching without quotes works fine. I think that's probably the desired behavior, because it lets you readily find EAN characters in the text.

The only other place where this isn't happening is the completion suggester (which provides suggestions as you type in the search box). It's possible to equate the characters there, too, for all Arabic language wikis. (We've done something similar for Russian.) I can do that if it would be useful there, too.

@TJones , yes please the completition suggester would be usefull in Arabic projects. Thank you.

What we did now is to create redirects by bot from eastern to western for articles related to years (https://ar.wikipedia.org/w/index.php?title=%D9%A2%D9%A0%D9%A5%D9%A0&redirect=no), but probably we need to do it for all articles with wan.

I'll add updating the completion suggester for Arabic to my list. I may not get to it for a while.

I don't think you will need to add more redirects. The internal representation of the numbers is WAN, for articles, titles, and queries. So searching for ٥5٥ finds 555, even without a redirect for "٥5٥".

Once the change is made for the completion suggester, ٥5٥ will match both 555 and ٥٥٥ for suggestions.

TJones triaged this task as Medium priority.Apr 26 2019, 5:29 PM

Change 510953 had a related patch set uploaded (by Tjones; owner: Tjones):
[mediawiki/extensions/CirrusSearch@master] Fold Eastern Arabic Numerals to Western in the Completion Suggester for Arabic

https://gerrit.wikimedia.org/r/510953

Change 510953 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Fold Eastern Arabic Numerals to Western in the Completion Suggester for Arabic

https://gerrit.wikimedia.org/r/510953