Page MenuHomePhabricator

Investigate what we'd need to do to ignore double quotes in search queries
Closed, ResolvedPublic


Based on the research done in T136377, we'd like to go forward with a deeper investigation on removing or otherwise ignoring (or something else) to deal with queries that contain double quotes.

Note: see this comment for more stats

Event Timeline

debt triaged this task as Medium priority.Oct 25 2016, 10:44 PM

I suggest the fallback being to replace double quotes with spaces. Most of the time it won't matter, but it would help with queries like albert"einstein" house or "albert einstein"house which currently are treated as three words. Using spaces instead of stripping will keep them as three words, and I don't think there's any downside to having extra spaces in the query.

@TJones is this something we can test on Relevance Forge?

@mpopov, we already did! See my write up.

Changing double quotes to spaces cuts the zero results rate for poorly performing queries (i.e., fewer than 3 results) with double quotes almost in half. The overall ZRR impact was smaller, only a 0.1% decrease among poorly performing queries—but that's to be expected, since most queries don't have quotes.

debt claimed this task.