Page MenuHomePhabricator

Wildcard search returns wrong count of found pages
Closed, DeclinedPublicBUG REPORT

Description

In german wiktionary:
if you search for "bl*" there are reported 3.746 pages
if you search for "blu*" there are reported 4.214 pages

Steps to Reproduce:
https://de.wiktionary.org/w/index.php?search=bl%2A&title=Spezial:Suche&profile=advanced&fulltext=1&ns0=1
https://de.wiktionary.org/w/index.php?search=blu%2A&title=Spezial:Suche&profile=advanced&fulltext=1&ns0=1

Exact counts differ from time to time, because wiktionary changes. At least if a wildcard search comes to a premature end or is canceled on timeout this should be reported to the user. See discussion:
https://de.wiktionary.org/w/index.php?title=Wiktionary:Fragen_zum_Wiktionary&diff=7862515&oldid=7862157

Event Timeline

dcausse triaged this task as Medium priority.Jun 3 2020, 1:59 PM
dcausse moved this task from needs triage to elastic / cirrus on the Discovery-Search board.
dcausse added a subscriber: dcausse.

This is unfortunately a limitation of most search engines, the reason is that it's token based and a wildcard search has to:

  1. find all the words that matches the wildcard
  2. find the pages that match these words

The first step being limited to 1024 words there are no guarantee that bl* and blu* will choose the same set of words leading to inconsistent and misleading number of total hits.

I don't think that elasticsearch can give us the information that the first step has been limited.

Gehel added a subscriber: Gehel.

Search is not meant to provide exact count of results, there is zero chance that we will add that feature in Elasticsearch. So declining this. Feel free to re-open if you feel strongly about it.