Page MenuHomePhabricator

prefixsearch does not include result whose title is exactly the same as the search string
Open, LowPublic

Event Timeline

Anomie subscribed.

Sure it does, see https://it.wikipedia.org/w/api.php?action=query&prop=info|pageprops&generator=prefixsearch&gpssearch=12%20dicembre&gpslimit=10&ppprop=disambiguation for example, "12 dicembre" is in the result.

As noted in the module's documentation, "Depending on the search engine backend, this might include typo correction, redirect avoidance, or other heuristics." For some reason CirrusSearch is choosing to return the redirect at 1 Dicembre rather than the redirect at 1 dicembre or the actual target of both redirects at 1º dicembre for this particular example.

And at any rate it's not anything being done by the API itself, the API is just returning what the underlying SearchEngine is giving it. You'd want to file a bug against CirrusSearch if you find the behavior here buggy.

BTW, it looks like you can get the results you expect here, at least for this specific query, by [[https://it.wikipedia.org/w/api.php?action=query&prop=info%7Cpageprops&generator=prefixsearch&gpssearch=1%20dicembre&gpslimit=max&ppprop=disambiguation&gpsprofile=classic|adding gpsprofile=classic to the query]]. Of course, that wouldn't work on a wiki where the search backend doesn't support gpsprofile=classic, so it's of limited use if you care about supporting non-Wikimedia wikis that aren't using CirrusSearch.

dbarratt edited projects, added CirrusSearch; removed MediaWiki-Action-API.

The default prefix search is heavily tuned to finding content articles, It considers redirects and the page redirected to to be a singular entity, the version of the string chosen to show amounts to a heuristic that tries to decide between showing something closer to what you typed that exists as a redirect, or the original page title. Additionally this system considers two versions of the string with different casing to be the same string, only one cased version (chosen fairly randomly) is available to find.

If you are not looking for content articles then the default search profile is likely undesirable. The classic profile, mentioned by Anomie, gives you an extremely strict prefix matching which seems to be the desired use here.

debt subscribed.

This really isn't a bug per se, so now that @EBernhardson has answered in an above comment, we'll move this to watching/waiting.

MPhamWMF subscribed.

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

RhinosF1 removed a project: Discovery-Search.
RhinosF1 subscribed.

Re-opening tasks and removing from team workboard per IRC feedback given yesterday and discussion with MPham.