Page MenuHomePhabricator

prefixsearch does not include result whose title is exactly the same as the search string
Open, LowPublic

Event Timeline

dbarratt created this task.Feb 7 2019, 3:56 PM
Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptFeb 7 2019, 3:56 PM
Anomie closed this task as Invalid.Feb 7 2019, 6:06 PM
Anomie added a subscriber: Anomie.

Sure it does, see https://it.wikipedia.org/w/api.php?action=query&prop=info|pageprops&generator=prefixsearch&gpssearch=12%20dicembre&gpslimit=10&ppprop=disambiguation for example, "12 dicembre" is in the result.

As noted in the module's documentation, "Depending on the search engine backend, this might include typo correction, redirect avoidance, or other heuristics." For some reason CirrusSearch is choosing to return the redirect at 1 Dicembre rather than the redirect at 1 dicembre or the actual target of both redirects at 1º dicembre for this particular example.

And at any rate it's not anything being done by the API itself, the API is just returning what the underlying SearchEngine is giving it. You'd want to file a bug against CirrusSearch if you find the behavior here buggy.

BTW, it looks like you can get the results you expect here, at least for this specific query, by adding gpsprofile=classic to the query. Of course, that wouldn't work on a wiki where the search backend doesn't support gpsprofile=classic, so it's of limited use if you care about supporting non-Wikimedia wikis that aren't using CirrusSearch.

dbarratt reopened this task as Open.Feb 7 2019, 6:08 PM
dbarratt edited projects, added CirrusSearch; removed MediaWiki-API.
Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 7 2019, 6:08 PM
This comment was removed by EBernhardson.

The default prefix search is heavily tuned to finding content articles, It considers redirects and the page redirected to to be a singular entity, the version of the string chosen to show amounts to a heuristic that tries to decide between showing something closer to what you typed that exists as a redirect, or the original page title. Additionally this system considers two versions of the string with different casing to be the same string, only one cased version (chosen fairly randomly) is available to find.

If you are not looking for content articles then the default search profile is likely undesirable. The classic profile, mentioned by Anomie, gives you an extremely strict prefix matching which seems to be the desired use here.

debt added a subscriber: debt.

This really isn't a bug per se, so now that @EBernhardson has answered in an above comment, we'll move this to watching/waiting.