Page MenuHomePhabricator

[Story] Introduce EntitySearch service
Closed, ResolvedPublic


The EntitySearch service offers a way to search for entities by labels (and aliases).
The EntitySearch must support prefix matches, and should optionally support fuzzy matches (case insensitive).

The initial implementation should be based on the wb_terms table, and largely re-use the code in the TermTable.
The intend is to implement this based on ElasticSearch eventually.

Event Timeline

daniel created this task.Jan 15 2015, 7:31 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 15 2015, 7:31 PM
daniel added a comment.Feb 4 2015, 1:17 PM

(after discussion with adrian)
The baseline implementation of EntitySearch would be based on wb_terms, and would initially use the search code that currently resides in the TermIndex class. The code and the database structure could however be simplified if we reduce the baseline feature set as follows:

  • all term matches are case sensitive
  • no ranking is applied
daniel added a comment.Feb 4 2015, 1:32 PM

if we do the above, we can:

  • drop term_weight
  • drop term_search (and change the collation of term_text to be case insensitive)
  • remove the ranking code that sorts by weight
daniel added a comment.Feb 4 2015, 1:42 PM

Implementation Note: Ranking is currently done by an in-memory sort on 5000 prefix matches. This is needed to avoid a file-sort on the MySQL server: A composite index covering the text (or search_key) field plus the weight would not be able to utility the index on the weight for prefix matches. Composite indexes only work with exact matches. A multidimensional index would be needed to do ranked prefix matches in MySQL.

The proposed solution is to drop support for ranking based on SQL; Ranking would be fully supported in the implementation based on Elastic search.

Lydia_Pintscher triaged this task as Medium priority.Feb 19 2015, 4:03 PM
Lydia_Pintscher added a subscriber: Lydia_Pintscher.
Jonas renamed this task from Introduce EntitySearch service to [Story] Introduce EntitySearch service.Aug 13 2015, 4:43 PM
Jonas set Security to None.
Lydia_Pintscher closed this task as Resolved.Dec 13 2017, 9:08 AM
Lydia_Pintscher claimed this task.