Based on the research done in T136377, we'd like to go forward with a deeper investigation on removing or otherwise ignoring (or something else) to deal with queries that contain double quotes.
Note: see this comment for more stats
Based on the research done in T136377, we'd like to go forward with a deeper investigation on removing or otherwise ignoring (or something else) to deal with queries that contain double quotes.
Note: see this comment for more stats
I suggest the fallback being to replace double quotes with spaces. Most of the time it won't matter, but it would help with queries like albert"einstein" house or "albert einstein"house which currently are treated as three words. Using spaces instead of stripping will keep them as three words, and I don't think there's any downside to having extra spaces in the query.
@mpopov, we already did! See my write up.
Changing double quotes to spaces cuts the zero results rate for poorly performing queries (i.e., fewer than 3 results) with double quotes almost in half. The overall ZRR impact was smaller, only a 0.1% decrease among poorly performing queries—but that's to be expected, since most queries don't have quotes.