We've been patching and patching and patching to work around Lucene's query_string being finicky for a year and a half now. It must stop. We should write our own query parser which replicates the old behavior but supports things like "functio* progra*" and doesn't blow up on "foo OR".
I _think_ the right place to do this is an Elasticsearch plugin. And its a good choice after Elasticsearch 1.6 gets index sealing. I _think_. We could also do this in PHP but there are few good parser generators in PHP and in Java they are legion. Also, we'd still have to do Elasticsearch work to expose any extra kinds of queries. OTOH doing it in a plugin means we'll have to decide how/if we are going to support clients that don't install the plugin.
This is a good chunk of work - like a month minimally. And it sucks out a huge chunk of technical debt. But ultimately it only fixes a few super duper corner cases.
- Cirrus engineers
- Search power users
- Elasticsearch community at large
- Cirrus issues are easier to fix
- Certain queries that looks like they should work will work/can be made to work. For example "flat fo*" OR "bumby fo*". This is not hypothetical. People try these and the don't work for them and we _can't_ fix them with the current setup. Most of them are fixed simply by integrating with this project.
- We use query_string and we're public about how our setup works. And query_string is a trap that no one should use. People will copy our mistakes.
Estimate: One or Two Months