Page MenuHomePhabricator

[Story] Implement EntitySearch service on top of Elastic
Closed, ResolvedPublic

Description

The user-facing problem:

  • When a user adds a property to an item, and starts typing the name of the item he wants to link to, a query is run against the wb_terms table to find and rank results, e.g. https://www.wikidata.org/w/api.php?action=wbsearchentities&search=united&language=en
  • The current system uses in-memory sorting and very basic scoring for ranking.
  • This results in the search being suboptimal for users, sometimes displaying things in the wrong order or missing important entries
  • This also leads to high database load, with an increasing number of timeouts (and thus, no search result)

Technical issues:

  • The current implementation of the backend for this query is not performant
  • The current implementation lacks a good mechanism for updating term weights (boosts)

Proposed solution:

  • Searching for entities by label should be backed by EntitySearch (or Cirrus) for large wikis.
  • An SQL based search should remain as a fallback/baseline.

This could be implemented using the mechanism proposed in T89733: Allow ContentHandler to expose structured data to the search engine.. However, if we don't want to block on this, it may be simpler to just implement the relevant hook in Cirrus.


Implementation notes, from a brief discussion with Nik:

  • Cirrus already stores and maintains the number incoming links for all entity pages, using the standard mechanism used for wikitext pages as well.
  • labels and aliases should go into new custom fields
  • We can introduce custom fields using the CirrusSearchBuildDocumentParse hook, while T89733 isn't implemented yet.
  • Support for per-language field values can be spoofed by putting the language code as a prefix into the field value (with a separator, perhaps pipe or even linebreak).

See also: T99899: [Story] Looking up entities by external identifiers

Event Timeline

daniel created this task.Feb 4 2015, 12:43 PM
daniel raised the priority of this task from to High.
daniel updated the task description. (Show Details)
daniel set Security to None.
daniel updated the task description. (Show Details)May 19 2015, 6:48 PM
Deskana updated the task description. (Show Details)May 19 2015, 7:05 PM
Deskana added a subscriber: Deskana.
daniel updated the task description. (Show Details)May 19 2015, 7:13 PM
daniel updated the task description. (Show Details)May 20 2015, 4:27 PM
Jonas renamed this task from Implement EntitySearch service on top of Elastic to [Story] Implement EntitySearch service on top of Elastic.Aug 13 2015, 4:43 PM
Addshore removed a subscriber: Addshore.Apr 24 2017, 3:58 PM

This seems to be done, isn't it?

Lydia_Pintscher closed this task as Resolved.Dec 13 2017, 9:08 AM
Lydia_Pintscher claimed this task.

Yes :)