Page MenuHomePhabricator

Bad search highlighting and unwanted results
Closed, DeclinedPublic

Description

If I search for the word "rest" in the English Wikipedia (url: http://en.wikipedia.org/wiki/Special:Search?search=rest&fulltext=Search ) some distance down, but still on the front page, I get the item [[Virginia]]. The context was this:

... nps.gov/shen/naturescience/forests.htm |title=Shenandoah National Park - Forests |publisher=National Park Service |accessdate=2007-09-10}}</ref> ... legislation, and the jointly run [[Chesapeake Bay Program]] which conducts restoration on the bay and its watershed. The [[Great Dismal Swamp National Wil ...

The word "forests" and "restoration" were highlighted, yet are unwanted. (The article contains the word "rest" once, it really shouldn't be on the first page.) Continue the search for more instances.

If a word contains "rest" then it should only be returned and/or highlighted if the full word is related to rest, e.g. "rests", "resting", etc.

Some sort of JDBC accessible database containing lists of related words is probably the most efficient solution.


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/Special:Search?search=rest&fulltext=Search

Details

Reference
bz14152

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:11 PM
bzimport set Reference to bz14152.

rainman wrote:

The solution is to have lucene do the highlighting, so that the highlighter exactly knows which words or stems match the query. This has been implemented, but awaits for new hardware in order to go live on wikimedia sites.

I've also put a non-naive-approach highlighting into the core MediaWiki (wgAdvancedSearchHighlighting), but sysadmins are reluctant to turn it on because of possible performance issues.

Mass close WONTFIX open Lucene Search issues because extension Lucene Search was removed, and replaced by MWSearch. Please set to REOPENED if behaviour still exists with a another component, and update the domain.

Mass REOPEN after discussion with Robert. Domain: Wikimedia/lucene-search-2. Assigned to maintainer.

firstpeterfourten wrote:

If I search "slovenia slovenian" (to see what someone from the place is called), I get results that include both words
BUT only the first one is marked. Where the second word is shown, the last letter is in plain-type.
This seems like a simple bug in the highlighter code.

unable to reproduce. works for me.