Page MenuHomePhabricator

Fulltext search in PostgreSQL for non latin keywords is broken
Closed, ResolvedPublic

Description

Author: baldin

Description:
Fulltext search for non latin keywords (for example russian) is broken. No hits
for search russian words, but english words search is O'k.

PostgreSQL version is 8.1.4 (backport for GNU/Linux Debian stable (Sarge))
MediaWiki 1.8.2.

In includes/SearchPostgres.php in parseQuery function statement is used:

$searchon .= $terms[1] . $wgContLang->stripForSearch( $terms[2] );

stripForSearch() function is located in languages/Language.php

The comment in the function is following:

MySQL fulltext index doesn't grok utf-8, so we need to fold cases and convert

to hex

But if I rewrite this function in a simple way:

function stripForSearch( $string ) {

return $string;

}

russian full text search is appeared. So I think for PostgreSQL this is s solution.

P.S. (btw) PostgreSQL 8.2 comes with tsearche2 extention with full multibyte
(UTF-8) support. So it's possible to init database with unicode locales. I
checked it with ru_RU.UTF-8 locale.


Version: 1.8.x
Severity: normal
OS: Linux
Platform: PC

Details

Reference
bz8470

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 9:32 PM
bzimport set Reference to bz8470.
bzimport added a subscriber: Unknown Object (MLST).

Thanks, applied a simple db check and quick return in r18791.