Page MenuHomePhabricator

property search in entity selector should not be prefix-only
Open, MediumPublic

Description

The search in the property name box seems to be performed only on the first word of the property, for example "place of birth" or "place of death" show up only if you type "place" but they should be shown also when you type "birth" (maybe because you were looking for "birthplace").
Making alias will help only partially because, for example, I would like to type "birth" and being shown "place of birth" but also "date of birth", so while for "place of birth" adding "birthplace" as an alias will help, for other elements it may not be the case. Creating alias only to help searches doesn't look as The Right Thing To Do(TM) to me. Clarifying, I'm perfectly fine with having "''sound''" aliases as "birthplace" for "place of birth", "birthday" for "date of birth", what I'm not so sure it's a good thing is using alias to provide search shortcuts. Improving property search will improve user friendliness.
Further examples and discussion here: https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=6006546#Improve_property_search

Details

Reference
bz44773

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:41 AM
bzimport set Reference to bz44773.
bzimport added a subscriber: Unknown Object (MLST).

I don't think it makes sense to break down the phrase into words and search on them for those lists. It could be that I'm wrong, but I find the usefulness of this proposal to be highly language specific.

As long as the list of properties are limited this could work, but imagine searching for properties that share the initial letters with some common words like "of" and "in", or even if the label for the property is those common words. That will make the list explode.

I fail to see how it would be language specific, I'm sure there are many case in English where there is no alias for a given property with the most significant word in the first position. Just another example, if I type "language" I would like to shown both "native language" and "official language". Heck, I would like just to type "lang" and being show those two... .

I agree that making lists (and maintaining them) makes very little sense. I would change the search algorithm to search *any* substring in the property name. I think It's better to trade off a little speed to be sure that the property one is looking for is found.

Katie, Daniel: What's the status of this with the new search backend? Can we change this to a non-prefix-only search?

@Lydia: eventually, yes. but needs more thinking, coding, backend processes, etc

I already worked on that code in Gerrit 114165 and started doing more refactoring including a possible solution for this request in Gerrit 114748 (currently a draft).

As Daniel said, this needs a lot more thinking. My change set is far from being a solution but I hope it could be a step in the right direction.

I finished refactoring the related code in several patches but unfortunately had to abandon the draft that was supposed to fix this issue. My initial idea was to simply do an additional WHERE LIKE '%<search term>' if the other three requests that are currently done do not return enough results (basically WHERE id = '<search term>' concatenated with WHERE term = '<search term>' concatenated with WHERE term LIKE '<search term>%').

This is a bad idea for multiple reasons:

  1. Ranking will be bad. The "contains" results will always be hidden behind the "equals to" and "starts with" results.
  2. It should probably be different for Items and Properties.
  3. LIKE queries don't use any indexes if they start with a placeholder.

To make this a proper solution the least thing we need to do is to split labels into words (or come up with a more clever solution like identifying common prefixes like "date of" and turn such labels into "birth, date of"). Then we can add these individual words to our term index.

The current solution to do exactly that is very intuitive and simple: add aliases.

The whole wb_terms table is a little performance bottleneck, so we probably should find a way to use another database backend service for this (can we "abuse" CirrusSearch for this somehow?).

What could be done with our current setup is to have a new field on wb_terms term_back (or so) which has a reversed version of the term. With that we could also search for terms ending on a word...

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher lowered the priority of this task from Medium to Lowest.Apr 23 2017, 3:24 PM

We have ElasticSearch implementation now, do we want to do anything with this?

CBogen raised the priority of this task from Lowest to Medium.Aug 27 2020, 8:22 PM