Offer interwiki search with language detection functionality over the API
Closed, ResolvedPublic

Description

Presently, interwiki search with language detection (TextCat) functionality is only offered in desktop search. API clients such as mobile web, mobile apps, and others, may wish to use the TextCat language switching functionality. We should figure out what needs to happen in order to offer this functionality to API clients.

Deskana created this task.Aug 12 2016, 12:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 12 2016, 12:52 AM

This might be covered with the rewrite parameter, but i would have to double check

debt triaged this task as Low priority.Aug 18 2016, 10:15 PM
debt moved this task from Needs triage to Later on the Discovery-Search board.
debt added a subscriber: debt.

Moving this to later - should be do-able, just not sure how urgent (if at all) it would be needed

EBernhardson added a project: Easy.EditedAug 18 2016, 10:44 PM

It looks like the detection is occurring and the search is being performed (when srenablerewrites=1 is passed), but the api is not including the results in the output because they are put in a slightly different part of the ResultSet object. As such this should be a relatively easy fix (compared to working out the whole chain of things).

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptAug 18 2016, 10:44 PM

It sounds like the ResultSet object needs to be changed on the backend and the apps need to use srenablerewrites. Is there any reason the apps wouldn't want to turn on srenablerewrites now?

The ResultSet object itself is fine, and carries all necessary data. the API class needs to be updated to actually take that data from the ResultSet and add it to the api response. Might require some debate over what the format should be.

There is also a separate, and completely unrelated parameter, srinterwiki=1. This has no effect on the language detection portion of it, but controls if results from sister wiki's will be provided. For extra fun the way results are represented between the main results and the interwiki results is not quite the same :S. Standard results are in a list at query.search and have the keys: ns, title, snippet, size, wordcount, timestamp. Sister-wiki results are in a map at query.interwikisearch Each entry in the map has the local interwiki prefix (e.g. wikt for wiktionary) and then a list of results. Those result items have the keys: namespace, title, url.

The results from textcat should, imo, be added to the main result set at query.search. Part of the reason is that we have no guarantees that an interwiki prefix even exists to the detected language wiki, so it would not be able to be integrated correctly into the current format, but also because the query.interwikisearch is really about sister-wiki's, (same language, different project) in the WMF context.

From the perspective of the apps, what would be necessary to add to the API response so you can properly handle having results returned for an alternate wiki? Information necessary to tell the user what happened, and so you can take the user to the result when it is selected.

And for additional clarity, two different things happen when turning on srenablewrites:

  • If there are no results but a provided suggestion[1], then the suggestion will be run and results for that returned[2]. Note that in [1] you have query.searchinfo.suggestion, where in [2] that has changed to query.searchinfo.rewrittenquery
  • If there are < 3 results (after possibly rewriting with the suggestion), and we detect a language other than the primary language[3] of the wiki the query will be run against the alternate language wiki. The results of this query are not currently (but will need to be to close this ticket) returned by the api.

[1] https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=orangutan+symfony
[2] https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=orangutan+symfony&srenablerewrites=1
[3] https://en.wikipedia.org/w/index.php?title=Special:Search&search=вместе+с+которым+написал

@EBernhardson, thanks for the thoroughly detailed response and sorry for the slow reply.

srinterwiki=1
From the perspective of the apps, what would be necessary to add to the API response so you can properly handle having results returned for an alternate wiki? Information necessary to tell the user what happened, and so you can take the user to the result when it is selected.

I think the Android (and probably iOS) apps would want to be careful about enabling results from sister projects like Wiktionary and Commons. We have a difficult time providing a good native-ish experience for just Wikipedia across locales and adding extra projects sites should probably be done on a case by case basis, if at all (maybe the user taps the result and it just opens in a browser window if the site is unsupported?).

srenablewrites=1

Cool, I've made a new task, T151238, to track this. Seems like a pretty awesome feature we'd want on!

Change 324652 had a related patch set uploaded (by Smalyshev):
[WIP] Enable supplying inline interwiki results

https://gerrit.wikimedia.org/r/324652

So as of now, we have two sets of interwiki results - one that is enabled by interwiki config (so far happens only on Italian wiki AFAIK) and one that is enabled by secondary results from other wikis if primary query does not return anything.

In current code, srinterwiki=1 enabled the former, but not the latter. So the question would be how we enable the latter? I'd propose srinterwiki=1&srenablewrites=1 (it is similar to how Special:Search works) but other options possible.

Also, do we keep the new results in same place where they were before:

"interwikisearch": {
   "en": [
// English wiki results
         ]
}

Or we have two separate spaces for two kinds of interwiki search?

Just noticed:

The results from textcat should, imo, be added to the main result set at query.search

Do we want some mark that shows they come from another wiki then?

Users of the API should be able to distinguish between what we call "interwiki search" (i.e. queries issued to same-language sister projects from Wikipedia) and TextCat-based results (i.e. results issues to different-language Wikipedias when your query returns no results on one Wikipedia). I am agnostic to the actual result format as long as that is possible.

Change 324652 merged by jenkins-bot:
Enable supplying inline interwiki results

https://gerrit.wikimedia.org/r/324652

Smalyshev moved this task from Backlog to Done on the Discovery-Search (Current work) board.
Smalyshev claimed this task.
Smalyshev renamed this task from Offer TextCat functionality over the API to Offer interwiki search with langage detection functionality over the API.Dec 15 2016, 10:57 PM
Smalyshev renamed this task from Offer interwiki search with langage detection functionality over the API to Offer interwiki search with language detection functionality over the API.
Smalyshev updated the task description. (Show Details)

Should we document it somewhere? I couldn't find a place where we document search result formats...

Mini-docs here until I find where to document it:

It adds additionalsearch key (and additionalsearchinfo key) which contains secondary results - e.g. interwiki results from language detection. Requires enablerewrites to be on to work.

Deskana closed this task as Resolved.Jan 6 2017, 10:51 PM