Page MenuHomePhabricator

PostgreSQL searches do not treat Unicode full width characters as their normal counterparts
Open, NormalPublic

Description

The search engines for MySQL and SQLite treat "AZ" (that's #xff21 and #xff3a) as "AZ" (cf. [[Halfwidth and fullwidth forms]]), PostgreSQL does not and thus fails testFullWidth().

One idea would be to TRANSLATE() them in ts2_page_text() and ts2_page_title() and use a similar technique in SearchPostgres::parseQuery(). If so, we need to describe in the release notes how to regenerate the tsvectors after an update or detect if ts2_page_text() or ts2_page_title() has changed and then regenerate them ourselves (I prefer the former).

Of course, another imaginable approach would be try to push this normalization into a text search configuration for to_tsvector(), but I don't know whether this is even possible.


Version: 1.21.x
Severity: enhancement

Details

Reference
bz40821

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 1:04 AM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz40821.
bzimport added a subscriber: Unknown Object (MLST).
scfc created this task.Oct 6 2012, 5:06 PM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptNov 2 2016, 8:16 PM
Deskana set Security to None.
Deskana added a subscriber: Deskana.

Removing Discovery and Discovery-Search; our primary responsibility is to users of the Wikimedia sites, and we do not use Postgres there.