Introduce a parser class that holds all the regexes needed to parse
the fulltext query into a simplified AST.
This parser is not yet plugged into production code and is basically
a noop since it needs the elastic query building code.
The parsing logic remains very similar with the exception that we do
not destroy the input. This may give different queries in some cases
such as unbalanced quotes.
The advantage is that we can track features of the query directly to
the original user query opening the possibility (for example) to
properly do DYM suggestions by not relying on addPrefixes/addSuffixes.
The AST only contains the needed nodes to parse what we currently
support and still delegates all the boolean logic to the lucene
QueryString. The AST model definition will be expanded in the future
once we properly control the boolean logic.
I've included the fixture test in this patch but I can extract them
into another one if that makes the review easier. Sadly it was nearly
impossible to split this code into smaller chunks.
This is still fragile but the previous implementation is also very
fragile. I've removed most of the negative lookbehind (to handle
escape sequence) "(?<!\\\\)" in favor of consuming "\\\\.".
The new strategy uses the \G markup to simulate ^ without substr.