Page MenuHomePhabricator

CirrusSearch: Replace double quotes with spaces in queries
Open, MediumPublic

Description

Based on the research documented in this comment, T149143 and the research documented here that was done on Relevance Forge, we'd like to go forward with replacing double quotes with a space in search queries.

This replacement will help with queries like

albert"einstein" house

or

"albert einstein"house

which are currently treated as three separate words. Using spaces instead of stripping out the double quotes, will keep these types of queries as three words. We don't feel that there is any downside with having extra spaces in the query.

We'll need to check the edge cases for languages that use spaces and don't use spaces in their words.

Likely prerequisite: T156019: Develop plan for dealing with numerous second-try searches, aka "So Many Search Options"

Event Timeline

debt triaged this task as Medium priority.Oct 27 2016, 8:39 PM
debt updated the task description. (Show Details)
debt moved this task from needs triage to This Quarter on the Discovery-Search board.

Not sure what is the plan on this one - looks like FullTextQueryStringQueryBuilder has quite extensive handing for quotes with regard to fuzzy/phrase queries, and we probably don't want to break it. So I wonder what we are going to do here - do we remove quotes only in some specific cases? In which cases?

@TJones do you have more context on this?

Unlike the question mark situation, where we jump in before the query is run, in this case we want to wait for it to "fail" (0 results? < 3 results?), and then try an alternate query, without quotes.

This gets into the same problem you mentioned in T138958: we have to figure out how to order and control the flow of the alternatives for dealing with failed/poorly-performing queries.

I could put together a straw man proposal for how to deal with suggestions, quotes, "wrong keyboard", and language ID, and have a more co-ordinated conversation about all this.

Yes, I think it's time we had some orderly discussion and proper design on how to handle such things, we start having more and more of these and we need to introduce some order into the madness :)

I have put together a straw man proposal for how to deal with suggestions, quotes, "wrong keyboard", and language ID, and have a more co-ordinated conversation about all this. Please direct comments about the bigger picture to the talk page of the link above.

debt added a subscriber: CKoerner_WMF.

Merged T169104 in with this ticket...here are some salient notes:

@TJones did a great write-up here: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Quotes_and_Questions

The question for quotes is whether we want to automatically do the search without quotes, or suggest the search without quotes, or both, and under what conditions—zero results, < 3 results, always? Is it configurable by wiki or not? And the big one: How does it interact with language ID, wrong keyboard detection, and "Did you mean" spelling suggestions—which is all the stuff in T156019.

Our options seem to be to sort out the "So many search options" problem, which is complicated, or wedge in one or two more features, which would incur a lot of technical debt and might make it impossible to add something more important in the future, or put off the whole problem to next quarter. Punting to next quarter has been very popular. :]