Page MenuHomePhabricator

Reconsider how apostrophes are handled in completion search for wikidata
Open, MediumPublic


Observed Behavior:
Users are often surprised by missed results in the Quick Search input box when phrases sometimes contain an apostrophe, but the phrase is missed because we often type simple titles or simple names without the apostrophe like so:

"no mans sky"
(no results)

"no man's sky"

but when clicking "containing... no mans sky"
then the results page displays and shows the video game correctly.

Users do not expect that the Completion search (Quick Search input box) would work as an "Exactly this text" mode.

But for most cases that is indeed how it acts.

Expected Behavior:
Apostrophes for English language users should not be considered as part of the phrase in the Quick Search input box.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2019, 2:35 PM
Thadguidry updated the task description. (Show Details)Nov 7 2019, 2:40 PM
dcausse added a comment.EditedNov 7 2019, 2:50 PM

Adding a new preference is not really possible as this would mean we will index the data twice which we generally don't do for such narrow usecases.
When you say "simple search" do you mean running a full text search in the Quick Search input box? This again I'm afraid is not possible either, fulltext searches are too costly to be run on every character typed.
Concerning to meet the majority of users' expectations do you have any numbers or studies we could look at that support your statement?
About "Works like Google search" while I believe most users expect any search engine to be as smart and powerful as google, this statement is too vague to be useful in this ticket.

I suggest that you reframe your feature request just by stating that wikidata completion search should reconsider how apostrophes are treated and provide a few examples.

Hi @Thadguidry, thanks for taking the time to report this!
If you have time and can still reproduce the problem: Please add a more complete description to this task which includes a clear list of specific steps to reproduce the situation, describing actual results and expected results after performing the steps to reproduce, and/or a link to a public website where the issue can be seen.
You can edit the task description by clicking Edit Task. Ideally, exact and clear steps to reproduce should allow any other person to follow these steps (without having to interpret those steps) and see the same results. Problems that others can reliably reproduce can get fixed faster. Thanks!

Thadguidry added a comment.EditedNov 7 2019, 6:28 PM

@dcausse Yes, I mean running a full text search. "simple search" is a term used by Blazegraph sometimes. Fulltext searches are cheap when you index terms in multiple ways. Why would you not want to index terms in multiple ways? Freebase was able to leverage this quite easily with Lucene/Solr indexes and provided great results on its search box on each character typed. Are you hurting for RAM to store the cached inverted indexes or something else with the infra? My quick calculations on 1 simple index in memory for all the terms (not just label/alias) in Wikidata, currently stats say 78 billion x 10 bytes per term = 78 gigs. Does Wikidata not have hardware to support multiple indexes? 1TB RAM (16x 64GB)

Thadguidry updated the task description. (Show Details)Nov 7 2019, 7:19 PM

We do already index terms in multiple ways but I don't think this is a good use of server resources to duplicate a field just for letting users to select how apostrophes should be handled. It's why I suggest to re-frame this ticket in a simple Actual behavior/Expected behavior form and point that apostrophe is the likely cause of this. Framed as it is at the moment it somewhat dictates a solution that we are unlikely to implement.

Thadguidry updated the task description. (Show Details)Nov 12 2019, 3:42 PM

Thanks, updated ticket.

dcausse triaged this task as Medium priority.Nov 12 2019, 4:25 PM
dcausse edited projects, added Discovery-Search; removed MediaWiki-User-preferences.


dcausse renamed this task from Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF. to Reconsider how apostrophes are handled in completion search for wikidata.Nov 12 2019, 4:26 PM
dcausse moved this task from needs triage to Wikidata Search on the Discovery-Search board.