At Vietnamese wikis, Special:Search should not redirect based on case-folding
Open, LowPublic

Description

At the Vietnamese Wiktionary, searching for “trường hộp” redirects to “trường hợp”, which is incorrect and potentially confusing to readers (because they might not notice the circumflex being replaced by a horn). At Vietnamese wikis, the search engine should perform case folding only for search suggestions, results, and Did You Mean; it should never redirect the user to a page that only matches due to case-folding. (There is one case where this behavior is useful: things like “xóa” and “xoá” are interchangeable. But we already have redirect pages for all these cases.)

The impact on Vietnamese wikis is high because most words have completely unrelated lookalikes when ignoring diacritics.

mxn created this task.Dec 13 2014, 11:39 PM
mxn updated the task description. (Show Details)
mxn raised the priority of this task from to Needs Triage.
mxn added a project: Wikimedia-Site-requests.
mxn changed Security from none to None.
mxn renamed this task from Special:Search should not conflate diacritics at Vietnamese wikis to At Vietnamese wikis, Special:Search should not redirect based on case-folding.
mxn added a subscriber: mxn.
mxn updated the task description. (Show Details)Dec 13 2014, 11:42 PM
TTO added a subscriber: TTO.
TTO added a subscriber: Manybubbles.
mxn updated the task description. (Show Details)Dec 14 2014, 8:14 PM
Aklapper triaged this task as Low priority.
mxn updated the task description. (Show Details)Dec 14 2014, 10:28 PM
mxn updated the task description. (Show Details)Dec 14 2014, 10:31 PM
Dereckson added a subscriber: Dereckson.
Dereckson added a subscriber: Nikerabbit.
mxn added a comment.Jul 18 2015, 8:24 PM

Just some examples to illustrate the severity of this issue:

  • Searching for “bác bỏ” (abandonment) takes you to “bắc bộ” (northern region), which redirects to “Bắc Bộ Việt Nam” (Northern Vietnam). If you’re a reader unfamiliar with MediaWiki, this may look like a political statement to you.
  • Searching for “khóa học” (academic course) takes you to “khoa học” (science). If you’re a reader unfamiliar with ElasticSearch’s ~ operator, it seems impossible to use the search bar to find information on academic offerings at universities.
  • Searching for “truyền thống” (tradition) takes you to “truyền thông” (communication). If you’re the same reader as above, it seems impossible to find information on traditions, and it’s kind of insulting that the site takes you to something random instead.

Of course, searching for “bác bỏ” wouldn’t take you to “Bắc Bộ Việt Nam” if the Vietnamese Wikipedia had an article on “bác bỏ”, but there are so many potential cases for confusion that the 40-some active editors cannot possibly write away the problem.

Restricted Application added a project: Discovery. · View Herald TranscriptJul 18 2015, 8:24 PM
mxn added a subscriber: santhosh.Jul 18 2015, 9:31 PM
mxn added a comment.Jul 18 2015, 9:35 PM

I’m considering working around this issue at the Vietnamese Wikipedia with a gadget that prepends ~ to any search from the search box that contains Vietnamese diacritics. But it’s a sledgehammer, and I’d much prefer to get proper language support into ElasticSearch or to turn diacritic folding off entirely.

Ironholds moved this task from Needs triage to Search on the Discovery board.Aug 4 2015, 8:18 AM