Page MenuHomePhabricator

Stemming highlighter doesn't entirely recognize the "exact phrase" search
Open, LowPublic

Description

Highlighting should work with the query, showing why the page matched not in addition to the query, showing, BTW also where the word is stemmed, even though the search query syntax explicitly requested non-stemmed.

When stemming is turned off by placing a word in double quotes, the pages are correctly listed, but on the search results page, sometimes stems in the snippet are still indicated in bold. This only happens when turning off stemming of the base word.

Matching works perfectly when turning off stemming for the word that is not a base word.

Searching for "clouds" (in quotation marks) gives

cloud clouded and clouds

as it should. But searching for "cloud" (in quotation marks) shows

cloud clouded and clouds

as seen at "cloud" prefix:user:cpiral.

Because match highlighting is used not only for location, but also for learning and teaching, documenting and bug-finding, this will eventually need fixing in order to teach what "exact phrase" means. Match highlighting in general is esp. important for trials in a sandbox.

Event Timeline

Cpiral raised the priority of this task from to Needs Triage.
Cpiral updated the task description. (Show Details)
Cpiral added a project: CirrusSearch.
Cpiral subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Could you give an example query where stemmed terms are highlighted please?

Concerning the "researchers" use case, could you elaborate a bit more? Is it just counting word frequencies (we can extract a top term for some wikis if needed)?
We plan to replicate our production indices into a lab instance suited for experimentation, maybe it will be more appropriate for this kind of usage.

Cpiral set Security to None.
This comment was removed by Cpiral.
Cpiral renamed this task from Stemming highlights the wrong terms in search results to Stemming highlighter doesn't understand "exact phrase" search.Jan 22 2016, 8:19 PM
Cpiral updated the task description. (Show Details)

Concerning the "researchers" use case, could you elaborate a bit more? Is it just counting word frequencies (we can extract a top term for some wikis if needed)?
We plan to replicate our production indices into a lab instance suited for experimentation, maybe it will be more appropriate for this kind of usage.

Thanks. I've added a new opening sentence in the description to clarify this possible interpretation.

Cpiral renamed this task from Stemming highlighter doesn't understand "exact phrase" search to Stemming highlighter doesn't entirely recognize the "exact phrase" search.Jan 22 2016, 8:45 PM
Cpiral updated the task description. (Show Details)
Aklapper removed a project: Discovery-ARCHIVED.