Page MenuHomePhabricator

Stemming for item suggestions, e.g. "the" vs. no "the" on Wikidata
Closed, InvalidPublic

Description

Story: "As a user, I want to use an item like "Harvard Monthly" (Without knowing its ID)"

Problem: The user can't find "Harvard Monthly" because the "proper" title for the publication is "The Harvard Monthly" (no matches found at all)

Notes:

  • Strangely it works with "Beatles" and "The Beatles" (Band)
  • It works not well with "Zeit" and "Die Zeit" (Newspaper); There are matches found for "Zeit", but after clicking "more" 3 times, "Die Zeit" is still not there (But many others which include the string "zeit", like in "Zeitschrift für Kunstgeschichte" or "Zeitakubyō"

Bildschirmfoto vom 2017-03-24 20-29-44.png (96×471 px, 4 KB)

Related Objects

StatusSubtypeAssignedTask
OpenNone
InvalidNone
ResolvedSmalyshev
InvalidNone
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
Resolveddcausse
ResolvedSmalyshev
Resolveddebt
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
ResolvedSmalyshev
ResolvedSmalyshev

Event Timeline

"The Beatles" has "Beatles" as alias.

We can not solve this with the current approach, which relies on MySQL prefix search. But this will be resolved more or less automatically the moment we switch this service to use elastic. This is already tracked in various tickets.

Deskana renamed this task from Stemming for item suggestions, e.g. "the" vs. no "the" to Stemming for item suggestions, e.g. "the" vs. no "the" on Wikidata.Mar 30 2017, 5:02 PM
Deskana moved this task from needs triage to search-icebox on the Discovery-Search board.

There's nothing specific to do here, since as noted above this problem will be solved when Wikidata eventually begins using Elasticsearch as a backend. This could be declined, or merged into the relevant tasks.

Marking this as invalid as I don't think we need to keep this around. Moving to Elastic is in progress.