Page MenuHomePhabricator

Apostrophes should not be indexed as part of words
Closed, ResolvedPublic

Description

Author: gildas.ribot

Description:
Hello everybody !

I'm french, i use the mediawiki for myself.

I have trouble with it:

for example:

i made a page "la Manoeuvre d'ortolani"

i wrote some words:

"la manoeuvre d'ortolani est destinée à tester la hanche des nourissons"

well, ok, it's in french ;-)

the big bad trouble is:

when i search the word ORTOLANI, the mediawiki find ... nothing because for
mediawiki
"D'Ortolani" is ONE WORD but in french "D'ortolani" is a contraction of TWO WORDS
"DE + ORTOLANI" = "D'ORTOLANI"

I'm tired to modified all the page by writing this for trying to correct this:

"Manoeuvre d'ortolani (ortolani)"

could you try to fix it ?

thank you

sorry my english is ... bad really.


Version: unspecified
Severity: normal

Details

Reference
bz9598

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:39 PM
bzimport set Reference to bz9598.
bzimport added a subscriber: Unknown Object (MLST).

The search engine use mysql Full-Text search engine, according to
the documentation [1] the apostrophe ' and underscore _ are considered
as part of the word.

The bug got fixed in mysql 5.1.6 [2]

Marking bug as

[1] http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
[2] MySQL bug report : http://bugs.mysql.com/bug.php?id=14194

ayg wrote:

That's MySQL fulltext. Wikimedia doesn't use that, so I doubt the bug was filed
with that in mind. This should be filed against Lucene, presumably. I don't
know if this is fixable on our end, though?

Lucene doesn't have this problem. reclosing.

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

This bug is still here

For example : https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=Arc-en-Ciel&fulltext=Search "Arc-en-Ciel" doesn't match any "L'Arc-en-Ciel" (with L') directly.

Compare with a search "L'Arc-en-Ciel" https://en.wikipedia.org/w/index.php?search=L%27Arc-en-Ciel&title=Special%3ASearch&fulltext=1

The "L'" should not be a part of the word, its another word.

Akeron: Could you please file a new bug report? This one got closed six years ago and nowadays issues for this problem to happen again are likely different. Thanks!