We need to extract a set of ambiguous queries (that return more than 1000 results for enwiki)
Ideally we need :
- a set with basic queries (no special syntax, no phrase search)
- a set with single word queries
- a set with multi word queries
We should carefully exclude queries from the WikipediaApp since they include partial words (search as you type) which will pollute the set.