Add new columns for Glent Method 1
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TJones
	Mar 17 2020, 7:44 PM

Description

we need a new column parallel to q1q2LevenDist for M‍1, with the token-aware edit distance value in it. "q1q2TokenAwareDist" might work as a name, though it's a bit long. "q1q2TokAwareDist"? "q1q2TAEDist"?

we need a new column parallel to queryNorm for M‍1, with the deduped version of the normalized query (i.e., with repeated characters removed) "queryNormDedupe" sounds good.

Details

	Subject	Repo	Branch	Lines +/-
	similar queries: Apply all-pairs matching against character deduplicated queries	search/glent	master	+34 -16

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T212884 [EPIC] Improve Search Suggestions with NLP (Did You Mean / Glent)
Open	None	T212889 [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages
Resolved	TJones	T238151 Tune Glent Method 1 algorithm
Resolved	EBernhardson	T247898 Add new columns for Glent Method 1

Event Timeline

TJones created this task.Mar 17 2020, 7:44 PM

TJones updated the task description. (Show Details)

TJones mentioned this in T238151: Tune Glent Method 1 algorithm.Mar 17 2020, 7:49 PM

I'm sure these would be needed/created in the SimilarQueriesSuggester when generating suggestions. Do we also need to write them to the suggestions table for use by SuggestionAggregator? Essentially are these needed when merging suggestions from many algo's and deciding the best single suggestion?

Per discussion at last weeks wendnesday meeting:

Rename q1q2LevenDist to q1q2EditDist and change it to a float. This will need to be applied to the data already stored in hive, along with adjusting the appropriate bits of glent.

The deduped version of the norm query wont need to be saved, it can be constructed on demand from the data already stored.

Change 583491 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/glent@master] similar queries: Apply all-pairs matching against character deduplicated queries

https://gerrit.wikimedia.org/r/583491

gerritbot added a project: Patch-For-Review.Mar 26 2020, 12:25 AM

Change 583491 merged by Tjones:
[search/glent@master] similar queries: Apply all-pairs matching against character deduplicated queries

https://gerrit.wikimedia.org/r/583491

Maintenance_bot removed a project: Patch-For-Review.Apr 9 2020, 7:10 PM