Page MenuHomePhabricator

Wikidata autocomplete service should do token search and not prefix search
Open, LowPublic

Description

(Reported against Wikidata Reconcile https://github.com/wetneb/openrefine-wikibase/issues/93, Antonin Delpeuch suggested that I report it here).

I'm trying to find prop MIC market code (P7534) ISO 10383 market identifier code
The autocomplete service (Antonin said that's https://wikidata.org/w/api.php?action=help&modules=wbsearchentities) finds it with MIC and MIC market but doesn't find it with any of these queries:

  • mic iso
  • mic code
  • iso market

From my observation, it searches as case-insensitive substring in label or description.
It would be better if it searches as a full-text search engine: take all words from label and description, and find them in any order, preferring matches in the correct order and where the query words are closer together.

The main search box does what I need, eg search for property:iso mic returns only the prop I'm looking for.

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptOct 22 2020, 7:39 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
VladimirAlexiev renamed this task from improve Wikidata to improve Wikidata autocomplete service.Oct 22 2020, 7:40 AM
VladimirAlexiev updated the task description. (Show Details)
Pintoch added a subscriber: Pintoch.
Thadguidry added a subscriber: Thadguidry.EditedOct 22 2020, 10:54 AM

In Freebase, we offered word, phrase, and full (exact match). I think the wbsearchentities API could offer something similar, although with a slight cost of indexing.
Besides name we also supported alias{full}. Using alias: matched both name and aliases, using name: matched only on name.

Old docs archived here:
https://web.archive.org/web/20160731201411/http://wiki.freebase.com/wiki/Search_Cookbook

In addition to specifying what text fields should be matched it is also possible to specify how the match should occur by inserting one of the following modifiers between the operand and the text field:

{word} : require that the words in the string match words in the corresponding text field in the document. (default) 
{phrase} : require that the words occur next to each other in the same order in the corresponding text field in the document. 
{full} : like {phrase} but also require that the phrase exactly match the text field, not just words within the text field. Known as a "full match".

For example, to find the musical single called Home by Marc Broussard, you would use a filter like this:

filter: "(all type:/music/single name{full}:"home" /music/track/artist:"Marc Broussard")"

word is essentially what @VladimirAlexiev is asking for here, I think.
These parameters to control the search were indeed one of the most powerful search features that Andi Vajda incorporated in Freebase Search service when it was operational.

Addshore added a project: Discovery-Search.
Addshore added a subscriber: Gehel.
Gehel renamed this task from improve Wikidata autocomplete service to Wikidata autocomplete service should do token search and not prefix search.Nov 9 2020, 4:35 PM
Gehel triaged this task as Low priority.
Gehel moved this task from needs triage to Feature Requests on the Discovery-Search board.