As discovered in T232760, Method 1 (M1) suggestions can and should be improved before deployment or A/B testing. Tasks include:
[x] Improving the edit distance computation:
[x] special case costs
[x] discount cost for swaps (ab vs ba)
[x] discount cost for duplicate letters (ab vs abb)
[x] add increased cost for editing digits
[x] take tokens into account
[x] add per token edit limits
[x] add a penalty for differing numbers of tokens
[x] decreased token delta penalty if strings differ only by spaces
[x] add a penalty for changing the first letter of a token
[x] except for letter/space swaps (ab cdef vs abc def)
[x] Optimizations
[x] early termination when over the edit distance limit
[x] early termination when over the token delta limit
[x] Investigate using Jaccard similarity on characters in the strings to terminate early (can't use just string length because of duplicate letter discount)
[x] ensure support for 32-bit characters. (Ugh.)
[] optimize penalties and discounts for the various cases above (based on M1 training data)
[] Filter M1 queries with search syntax in them (esp. negated queries)
[] Incorporate edit distance into M1 suggestion selection
[] Investigate weighting of edit distance vs number of results