Change Details

As discovered in T232760, Method 1 (M‍1) suggestions can and should be improved before deployment or A/B testing. Tasks include: [] Improving the edit distance computation: [x] discount cost for swaps (ab vs ba) [x] discount cost for duplicate letters (ab vs abb) [x] add per token edit limits [x] add a penalty for differing numbers of tokens [x] add a penalty for changing the first letter of a token [x] except for letter/space swaps (ab cdef vs abc def) [] Optimizations [x] early termination when over the edit distance limit [x] early termination when over the token delta limit [] Investigate using Jaccard similarity on characters in the strings to terminate early (can't use just string length because of duplicate letter discount) [] optimize penalties and discounts for the various cases above (based on M‍1 training data) [] Filter M‍1 queries with search syntax in them (esp. negated queries) [] Incorporate edit distance into M‍1 suggestion selection [] Investigate weighting of edit distance vs number of results