Page MenuHomePhabricator

lucene search for simple text misses some results
Closed, DeclinedPublic


I searched for the word πράγμα on (no other preferences, main namespace, no punctuation or anything). It returned a list of very few results, not including for example
(θησαυρός) which contains the word in the 6th definition:
ένα πρόσωπο ή πράγμα που διαθέτει σε μεγάλο βαθμό κάτι πολύτιμο

Note that this is not part of a template parameter or anything, it's just plain text. It's not even in bold or italics or anything.

Also the history of the page indicates that the word has been in the article for over three years, so it can't be an "indices haven't been rebuilt since then" sort of issue.

Any thoughts?

I ran into a similar issue on officewiki a while back and dismissed it then as a fluke; I now don't think it was.

Version: unspecified
Severity: major



Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:11 PM
bzimport set Reference to bz25404.

rainman wrote:

The page is up-to-date in the index, and the word is not in a template, so in theory it should work.

Must be some kind of article parsing issue.

seems to stil be occurring after update was run (bug #28605). Could you confirm, Ariel?

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

Bug is still valid.

Going to , entering πράγμα into the search field, making the results display ALL results, θησαυρός is missing, and still includes πράγμα in the sixth definition.

This bug really shouldn't exist in CirrusSearch, as it has much better support for languages other than English. As Lucene is reaching the end of its life and we'll soon be migrating fully over to CirrusSearch, I'm changing this to RESOLVED WONTFIX.

If this bug does still exist in CirrusSearch, feel free to re-file it with the new test case under MediaWiki Extensions -> CirrusSearch.