Page MenuHomePhabricator

CirrusSearch: Fix highlighting for phrase prefix queries
Closed, ResolvedPublic

Description

In https://gerrit.wikimedia.org/r/#/c/197397/ it was observed that highlighting isn't working correctly for phrase prefix queries.

To do:

  • Define desired highlighting behavior for phrase prefix queries
  • Make it so, Number One

Event Timeline

Jdouglas claimed this task.
Jdouglas raised the priority of this task from to Medium.
Jdouglas updated the task description. (Show Details)
Jdouglas subscribed.

Here's what I see with the current patch in https://gerrit.wikimedia.org/r/#/c/197397/:

2015-03-17-140557_708x290_scrot.png (290×708 px, 32 KB)

Seaching for "programming is ref*" highlights the phrase programming is referential in Functional programming is referential transparency.

Oh, actually, that looks like its working.... Are you using the experimental highlighter?

On enwiki, this does indeed appear to be problematic:

2015-03-31-101523_765x459_scrot.png (459×765 px, 61 KB)

Ok - so this is an error in the highlighter I think. It must not know what to do with these queries. Its actually a reasonably fix - we reproduce it with the unit tests in https://github.com/wikimedia/search-highlighter and then fix it and backport the fix to the 1.3 branch there because we're still running Elasticsearch 1.3.

Now I'm having trouble reproducing this locally with the patch in https://gerrit.wikimedia.org/r/#/c/197397/:

2015-03-31-105056_774x273_scrot.png (273×774 px, 32 KB)

This problem appears to be twofold:

  1. The search highlighter seems to only work when we run our phrase_prefix query against text.plain -- not for title.plain, all.plain, etc.
  2. With the new support for phrase_prefix queries, we can sometimes lose part of the query when highlight_query is set in other places in Searcher.php

This gets us halfway there: https://gerrit.wikimedia.org/r/#/c/201350/

2015-04-01-142115_829x277_scrot.png (277×829 px, 33 KB)

Still to figure out - why phrase prefix highlighting is working for the title, but not the text.

Jdouglas renamed this task from Fix highlighting for phrase prefix queries to CirrusSearch: Fix highlighting for phrase prefix queries.Apr 2 2015, 9:55 PM
Jdouglas set Security to None.
Jdouglas updated the task description. (Show Details)
dcausse assigned this task to Jdouglas.