when the query contains a lot of words (questions) the default AND is not appropriate because a single missing stopword could hide a good result. We could use the minimum_should_match attribute which allows to force a minimal number term to match (e.g. 90% of the query terms should match).
There's also another interesting query which will do the "stopwords stripping" automagically, it's the common term query .
In few words this query is able to detect stopwords by analyzing word freq at query time, so the query:
What's the connection between power laws and zipf distribution
will be split into 2 clauses :
- connection power laws zipf distribution
- what's the between and
And we can control the boolean operator of these clauses independently, e.g. OR for high freq words and AND for low freq words. Or even more complex stuff like "3<80%" : if there is more than 3 words only 80% of them are required.
Here's a more readable blog post about Common Terms. And, for reference, ES has stop word lists for >30 languages.