Page MenuHomePhabricator

Analysis of Method 1 Suggestion results
Closed, ResolvedPublic

Description

Gather suggestion output from Elastic-based suggestions and Method 1 suggestions for a collection of data, and analyze the results.

When we did this for M0, we used 2 months of enwiki data to build the model and evaluated the results on 1 month of enwiki data. Something similar would be fine this time, too.

Analysis will include counting how often Elastic-based suggestions are made, how often Method 1 suggestions are made, how often both are made, and a manual review of a sample when both are made to see which does better—which is the same as what we did for M0.

There is some concern about the possibility of lower-quality Method 1 results for shorter strings, so if that looks to be a problem—either because of the high volume and/or lower quality of Method 1 suggestions for shorter queries—we may look into shorter queries more carefully.

Event Timeline

TJones renamed this task from Analysis of M1 results to Analysis of M1 Suggestion results.Sep 12 2019, 4:50 PM
TJones renamed this task from Analysis of M1 Suggestion results to Analysis of Method 1 Suggestion results.
TJones created this task.
TJones moved this task from needs triage to elastic / cirrus on the Discovery-Search board.
TJones updated the task description. (Show Details)

Samples are available here: notebook1004:/home/dcausse/phrase_suggester_vs_glent_m1.csv

TJones claimed this task.Oct 11 2019, 7:24 PM

I completed my analysis of Method 1, and it performs significantly worse than the current production DYM. I think we should improve Method 1 before considering an A/B test. Full details on MediaWiki.

Summary:

Method 1 Anti-Patterns:

  • over-emphasis of result counts—
    • creating negated queries, like fogus to -ous which gets 5.9M results.
    • changing letters or adding spaces to create a very common word (cf gene to a gene) or duplicated word (rattle battle to battle battle).
  • overly drastic changes—
    • edit distance limits should be per-token, not per string (cf gene to a gene again)
    • changing a letter to space should have a higher cost (abbys to a b s)
    • changing the first letter of a word/token should have a higher cost (cia assassinations to mi6 assassinations)
  • using weird stemming edge cases to increase result counts—
    • e.g., godness stems to god so it beats goddess; hering stems to here so it replaced herring in red herring

Reinforcing Positive Method 1 Patterns:

  • Edit distance cost should be decreased for double-letter to single-letter change (or vice versa)
  • Edit distance cost should be decreased for swapped letters, possibly including swapped with a letter in between (levasimole vs levamisole)

I realize that I've assumed that edit distance plays a role in the weighting of suggestions, but I'm not sure that's the case. If not, it probably should be, rather than letting result count reign supreme.

Gehel closed this task as Resolved.Oct 29 2019, 5:51 PM