Page MenuHomePhabricator

Add optimized Elasticsearch field mappings for autocomplete searches
Closed, ResolvedPublic

Description

The implementation of T294932: Lists: Auto-complete/lookahead search for adding tools is using a simple_query_string full text search that only really differs from the site search implementation in the fields scanned for matches and lack of computed facets.

We may get both better performance and more exhaustive matching by making some changes to our toolinfo document mapping to add specially typed subfields to name and title and also changing the query type to something like multi_match using N-gram tokenizers (one with shingle size 2 and another with shingle size 3). N-gram searches end up working a bit like a double wildcard search (*something*) but are more runtime performant at a cost of larger indexes.

Event Timeline

Elasticsearch 7.x has a specific field type of search-as-you-type intended to optimize autocomplete style searches where the user is typing and the application shows possible completions. That would however be blocked on an Elasticsearch cluster upgrade.

@bd808 Is this the responsibility of the search team? I can inquire whether this is on their roadmap or not

@bd808 Is this the responsibility of the search team? I can inquire whether this is on their roadmap or not

No, it is our responsibility. Using any ES 7.x features would be blocked on the Search team upgrading the Elasticsearch servers that host our index, but the configuration and content of the index is managed by us.

Sorry, yes I should have specified upgrading the ES servers.

@MPhamWMF Can I ask if you all have any plans in the near-term to upgrade to ES 7.x?

@sdkim , we are planning to upgrade to ES 7.10 starting Q3, with the current estimate to be up to ~2 quarters of work.

Calling out dependency on https://phabricator.wikimedia.org/T263142 for this task

Calling out dependency on https://phabricator.wikimedia.org/T263142 for this task

That is a dependency for an implementation based on the upstream search-as-you-type feature, but not for generally addressing the current implementation deficiencies. There are a number of currently available field analysis and query type changes which could be made.

There are a number of currently available field analysis and query type changes which could be made.

@bd808 Please enumerate the changes and the outcome they would provide.

If search-as-you-type is separate from what this task was intended for I can create a separate task for it if that works for you?

There are a number of currently available field analysis and query type changes which could be made.

@bd808 Please enumerate the changes and the outcome they would provide.

  • Can you elaborate more on this? I'm interested in understanding your point. @bd808 can you weigh in on this too? right now we are approaching this problem through stemming but maybe we can do more?

I think that T295942: Add optimized Elasticsearch field mappings for autocomplete searches is basically talking about the same issue. There is a fancy new "search-as-you-type" feature for this in Elasticsearch 7.x which we do not currently have access to. The new feature however is really just some nice syntactic sugar for creating a custom field type that uses multiple N-gram tokenizers (one with shingle size 2 and another with shingle size 3). N-gram searches end up working a bit like a double wildcard search (*something*) but are more runtime performant at a cost of larger indexes.

bd808 renamed this task from Consider adding optimized Elasticsearch field mappings for autocomplete searches to Add optimized Elasticsearch field mappings for autocomplete searches.Jan 28 2022, 9:09 PM
bd808 raised the priority of this task from Low to Medium.
bd808 updated the task description. (Show Details)
bd808 moved this task from Backlog to Groomed/Ready on the Toolhub board.

Change 761691 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[wikimedia/toolhub@main] Search: Add optimized Elasticsearch field mappings for autocomplete searches

https://gerrit.wikimedia.org/r/761691

Raymond_Ndibe moved this task from Groomed/Ready to Review on the Toolhub board.
Raymond_Ndibe moved this task from Backlog to In Review on the User-Raymond_Ndibe board.

Change 761691 merged by jenkins-bot:

[wikimedia/toolhub@main] Search: Add optimized field mappings for autocomplete searches

https://gerrit.wikimedia.org/r/761691

Change 770638 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Bump container version to 2022-03-15-002555-production

https://gerrit.wikimedia.org/r/770638

Change 770638 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Bump container version to 2022-03-15-002555-production

https://gerrit.wikimedia.org/r/770638