Generator prefix search ignores gspnamespace prop for some queries
Closed, InvalidPublic

Description

Repro steps:

  1. Navigate to this API query.

Expected: Given the gpsnamespace prop is set to 0 (the main namespace), all results should have an ns of 0.

Actual: All results have an ns of 4.

Other repro queries: replacing Wikipedia: with Talk: and replacing gpsnamespace with 1.

jwngr created this task.Mar 7 2018, 6:01 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 7 2018, 6:01 PM
Anomie added a subscriber: Anomie.

This may or may not be intended behavior of the search engine, but either way it's part of the search engine code (both core and CirrusSearch) rather than the API itself.

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptMar 8 2018, 4:14 PM
dcausse added a comment.EditedMar 8 2018, 6:00 PM

I may be wrong but I think this the expected behavior, one can override the list of namespaces set in the API params (or namespace filter in Special:Search) by prefixing their query with a namespace:

  • file:foo will search foo in the File namespace disabling any namespace selection made with api params

Since your queries do not include any words to search the search engine will assume that you are searching an empty string. That would be the sole inconsistency I see here, by default we return nothing on empty strings but here with the namespace prefix we return all pages in the namespace.

EDIT: we actually return nothing with Special:Search when searching Talk: but with the API we return something, we should probably make the SearchEngine API integration similar to Special:Search.

jwngr added a comment.Mar 8 2018, 7:58 PM

Thanks for the details. Just to be clear, is this expected even if the search string is not a prefix like Wikipedia: or Talk:? For example, here is a search for Raw egg dishes which returns just a single result, which is in namespace 14 even though gpsnamespace is set to 0:

Query:

https://en.wikipedia.org/w/api.php?action=query&format=json&gpssearch=raw%20egg%20dishes&generator=prefixsearch&prop=pageprops&redirects=&ppprop=displaytitle&gpsnamespace=0&gpslimit=5&origin=*

Response:

{
  "batchcomplete": "",
  "query": {
    "redirects": [{
        "index": 1,
        "from": "Raw egg dishes",
        "to": "Category:Raw egg dishes"
      }],
    "pages": {
      "56025925": {
        "pageid": 56025925,
        "ns": 14,
        "title": "Category:Raw egg dishes",
        "index": 1
      }
    }
  }
}
dcausse added a comment.EditedMar 8 2018, 8:24 PM

Oh my bad! this is using prefixsearch so please forget what I said about the inconsistency between Search:Search and API.

Back to your question:
Yes Category:Raw egg dishes is expected to be found when searching on namespace 0 (and when you ask the API to resolve redirects), it's because Raw egg dishes is a cross namespace redirect: https://en.wikipedia.org/w/index.php?title=Raw_egg_dishes&redirect=no
So basically the title belongs to NS_MAIN but it redirects to Category:Raw egg dishes.

Anomie added a comment.Mar 8 2018, 8:43 PM

For that one, prefixsearch is returning the mainspace page "Raw egg dishes". But since you included redirects= in your query, the API follows the cross-namespace redirect after generating the page. Try it without to see the difference.

jwngr added a comment.Mar 9 2018, 1:13 AM

I see, that makes some sense. For my use case, I want to both follow redirects and only get things in namespace 0, but I ended up just filtering the results on the client.

dcausse closed this task as Invalid.Mar 9 2018, 9:36 AM

Closing as invalid since I don't see anything here that is not expected.

  • with prefixsearch the namespaces selected using api params can be overwritten using a namespace prefix in the search query
  • when asking the API to follow redirects the results may not be in the namespace requested in case of cross-namespace redirects

please feel free to reopen the ticket if you think that I overlooked/misunderstood something.

Thanks!

Anomie added a comment.Mar 9 2018, 7:46 PM

I'm not going to reopen, but I will submit a patch to update the documentation to take note of this.

Change 418031 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] API: Update prefixsearch/opensearch docs

https://gerrit.wikimedia.org/r/418031

Change 418031 merged by jenkins-bot:
[mediawiki/core@master] API: Update prefixsearch/opensearch docs

https://gerrit.wikimedia.org/r/418031