Page MenuHomePhabricator

Restore descriptions in opensearch API
Closed, DeclinedPublic

Description

Descriptions have been removed from the action=opensearch API due to performance concerns and lack of use cases (see T240691: TextExtracts extension frequent slows down opensearch API by several seconds, especially T240691#5757544 for performance and T240691#5742010 for how this is (not) used by browsers).

See API:Opensearch#Response for how the results used to look; the second array now holds empty strings.)

While action=opensearch is an endpoint meant specifically for browsers, and only end-of-life browsers with low usage seem to make use of descriptions, it seems many other clients did use this endpoint, judging from the feedback on various support channels (e.g. 1, 2, 3, 4). Since many mobile apps cannot be easily updated for existing users, there might be value in restoring descriptions to action=opensearch if there's a performant way to do it. Two options came up:

  • Use Wikidata descriptions, which are stored in page_props so can be retrieved with a single query. This is a different type of description but arguably more suitable to a search API (the Wikimedia apps display Wikidata descriptions for search, for example).
  • Wait for T213505: RfC: OpenGraph descriptions in wiki pages, then build on it to use descriptions from the Page Content Service.

Clients which can update their logic should use the prefixsearch API:

  • For descriptions, something like action=query&generator=prefixsearch&gpssearch=<search term>&prop=extracts&exintro=1&explaintext=1&redirects=1 (sandbox) which will give results like
{
    "batchcomplete": true,
    "continue": {
        "gpsoffset": 10,
        "continue": "gpsoffset||"
    },
    "query": {
        "pages": [
            {
                "pageid": 10165010,
                "ns": 0,
                "title": "Harry Sassounian",
                "index": 7,
                "extract": "Harry M. Sassounian, also known as Hampig Sassounian, is serving a life sentence for the 1982 assassination of Turkish Consul General Kemal Arıkan (or Arikan) at a street intersection in Los Angeles, California, United States. He was born in Beirut, Lebanon."
            },
            {
                "pageid": 15432681,
                "ns": 0,
                "title": "Hampigny",
                "index": 4,
                "extract": "Hampigny  is a commune in the Aube department in north-central France."
            }
        ]
    }
}
  • For extracts, something like action=query&generator=prefixsearch&gpssearch=Hampi&prop=description&redirects=1 (sandbox) which will give results like
{
    "batchcomplete": true,
    "continue": {
        "gpsoffset": 10,
        "continue": "gpsoffset||"
    },
    "query": {
        "pages": [
            {
                "pageid": 10165010,
                "ns": 0,
                "title": "Harry Sassounian",
                "index": 7,
                "description": "American assassin",
                "descriptionsource": "central"
            },
            {
                "pageid": 15432681,
                "ns": 0,
                "title": "Hampigny",
                "index": 4,
                "description": "Commune in Grand Est, France",
                "descriptionsource": "local"
            }
        ]
    }
}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Unfortunately, the TextExtracts feature has been obsoleted many years ago and is no longer actively maintained. It was intentionally disabled from the OpenSearch API. There were severe performance problems with it.

See T240691: TextExtracts extension frequent slows down opensearch API by several seconds for more details.

From the main consumers of this API that we looked at, only the titles and urls were used. As such, on short notice (over the holidays) it was preferred to remove the (mostly) unused descriptions from this otherwise unrelated query.

You can continue to fetch these old-style TextExtracts descriptions from the prefixindex/prop=extracts directly. This API remains available and is the same as what we used to inject into the OpenSearch API.

In the future, depending on priority and resourcing, we may be able to re-introduce this if performance can be improved. Alternatively, it might make sense to provide descriptions from a different API instead. Such as the Page Previews API (as used by the Popups feature on desktop), or Wikibase descriptions (used by the mobile interface, at en.m.wikipedia.org.)

See:

Using wiki base descriptions or page previews api would be fine with me, I am more interested in functionality than identical results.

We use this to display results in the iOS app Wikipanion, and while I can make an update to the client, this leaves people running older versions in the dark ( those using very old versions of iOS).

I’m also not the only one running into the problem, someone else on the mediawiki api list also just reported the issue.

It would be nice if this breaking change were announced on the list like previous breaking changes, with a several month transition period so at least we can provide a seamless transition to our users.

I should also not that the prefixindex api returns results in a different order than the opensearch api, although I haven’t spent time evaluating which ordering is better.

It would be nice if this breaking change were announced on the list like previous breaking changes, with a several month transition period so at least we can provide a seamless transition to our users.

I don't think a several month transition period would have been reasonable for this. Its kind of unclear how the deprecation policy would apply to this - https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap#Deprecation_process .Personally, I think an announcement to mediawiki-api would have been nice, in retrospect.

There was no breaking change - this is an API returning results according to the OpenSearch spec; it still does that. It is also explicitly documented as being meant for web browsers, and might change at any point in ways that make it better for browsers. (This was such a change, as the overwhelming majority of browsers do not use the descriptions, but speed does matter for them.) If that negatively affects other clients, that's unfortunate, but it's not the API maintainers' responsibility to accommodate them.

That said, there are two ways to re-add descriptions:

  • Use Wikidata descriptions. That's a single SQL query (they are duplicated to page_props, at least in the content language), the API already looks at the database, so performance should not be affected significantly. There's already a helper class (and there also seems to be some way to make the search engine do the job via AugmentPageProps, although I have no idea how that works) so that's probably less than a day of work and doable by a motivated volunteer.
  • Build on T213505: RfC: OpenGraph descriptions in wiki pages to add PCS descriptions. I think the Product Infrastructure team has some interest in moving that RfC forward, but it's a big task so I wouldn't expect it to be done any time soon.

If someone wants to do the first, I can give pointers; not sure how / by whom the decision should be made that that is the right path to take, though.

Tgr renamed this task from api.php?action=opensearch missing search result descriptions to Restore descriptions in opensearch API.Jan 1 2020, 8:10 PM
Tgr updated the task description. (Show Details)

Using wiki base descriptions or page previews api would be fine with me, I am more interested in functionality than identical results.

Each of these three systems (Wikibase descriptions, legacy TextExtracts, and PagePreviews) have public APIs readily available to you for immediate querying.

They will not be coming back to OpenSearch API however for the foreseeable future, as the vast majority of clients for that API do not need them.