Page MenuHomePhabricator

Reader gets page description in search results
Closed, ResolvedPublic3 Estimated Story Points

Description

"As a Reader, I want to read a description of each page in a search result set, so I can quickly evaluate if the page is relevant to my search topic."

Propose adding a description property to the Page object in schema.

For WMF sites, the description should be the description from Wikidata, in the wiki's local language (so en for en:wp, fr for fr:wp, etc.)

Event Timeline

BPirkle added subscribers: Anomie, BPirkle.

Feedback from @Anomie :

would probably need a hook to allow TextExtracts or Wikibase to add the data to the result. Given T240691, Wikibase would probably be the better bet.

you can probably get it from the Action API already with generator=search&prop=extracts (but I don't see any way to use the Wikibase description instead). But the Action API doesn't like giving the search metadata when used as a generator, so you'd have to do list=search&generator=search and hope they don't wind up returning slightly different result sets. I think at least one of the mobile apps already does this, or did at one point.

Note: to get the data for both this task and T245673 from the Action API, you'd use generator=search&prop=pageimages|extracts

Update: @Anomie also said "Action API has prop=description for getting the description from Wikibase"

Info on the TextExtracts extension is here:
https://www.mediawiki.org/wiki/Extension:TextExtracts

From that page:

The TextExtracts extension provides an API which allows to retrieve plain-text or limited HTML (HTML with content for some CSS classes removed) extracts of page content.

and

For obtaining summaries in production environments, the Page Content Service is recommended and used by Wikimedia products.

For reference, here's an example from Action API against enwiki. These use prop=description, so presumably these descriptions are from Wikibase and not TextExtracts.

This is a normal search. There are no descriptions in the results:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=Craig%20Noone

This is a search for the same string, but using a generator:
https://en.wikipedia.org/w/api.php?action=query&format=json&gsrsearch=Craig%20Noone&generator=search&prop=description
Notice that these results include a "description"

This includes both "list=search" and "generator=search" as suggested by @Anomie
https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=Craig%20Noone&gsrsearch=Craig%20Noone&generator=search&prop=description
Notice that these results include both a "pages" and a "search" block under "query".

There's also a summary endpoint that provides a better text extract and was built specifically to satisfy the requirements of the web team. I would guess that eventually they will want the results from it. As an MVP using TextExtracts should be ok, but we should also think about how to include summary if needed.

There's also a summary endpoint that provides a better text extract and was built specifically to satisfy the requirements of the web team. I would guess that eventually they will want the results from it. As an MVP using TextExtracts should be ok, but we should also think about how to include summary if needed.

On that note, see also T214000: Evaluate difficulty of porting PCS summary logic to PHP, T213505: RfC: OpenGraph descriptions in wiki pages, T241437: Restore descriptions in opensearch API, and some of the discussion on T240691.

WDoranWMF set the point value for this task to 3.Feb 26 2020, 8:09 PM
eprodromou triaged this task as Medium priority.Feb 26 2020, 8:12 PM
daniel added a subscriber: daniel.Apr 6 2020, 7:20 PM

This enhancement already exists in the action API, the TextExtracts extension does this kind of thing for ApiOpenSearch via the ApiOpenSearchSuggest hook. Wikidata descriptions are injected via a different mechanism, using a generator IIRC. It would be nice to share code, instead of implementing the same thing twice. There are several possible strategies for this.

We could try to use the same hook, but we'd have to mangle the input and output structure.
Or we create a new hook that could be used in both APIs.
Or we could put a new hook into the application logic that is used by both APIs. That is the nicest option, but requires more design and though. We'd probably end up extending the SearchSuggestion class (and maybe also SearchResult), and putting a hook into SearchEngine::processCompletionResults. There is already SearchResultsAugment hook, but it's not for suggestions, and does not explicitly model thumbnails....

Basically, before implementation on this starts, I'd like to see a rough outline of how this is supposed to work, with the aspect of consistency and code sharing between the action API and the REST API at least considered.

I updated this with the note that the correct value of the description should be the one from Wikidata, in the wiki-local language.

I updated this with the note that the correct value of the description should be the one from Wikidata, in the wiki-local language.

Why is the description in the wiki's content language, rather than the user's interface language?

I can see reasons for this, I'm just pointing out that the wikidata description is designed to be in the UI language, and we should make a conscious choice here.
By the way - if the description is in the UI language, language fallback should apply just like for UI messages (wikidata has this built in).

@daniel I talked it through with Olga, and we decided that using the wiki language was a good first effort. It's the description that appears in the mobile view for the description under the title of an article, for example. Also, for Search results and "Nearby" search, in the mobile Web view, I see the wiki language in the interface, not my browser language nor my language preferences in my user account.

At some point in the Core REST API we'll deal with having end-user language that's different from the wiki language; for example, Accept-Language headers, or the stored user preference. I'd prefer to keep the behaviour as wiki language by default, and only show the user language if requested by the app (maybe with a parameter). I think this would be one case where we'd consider using it.

For right now, we've been using the wiki language for most endpoints, and I can't come up with an exception. It makes caching more efficient, also.

eprodromou closed this task as Resolved.Jun 15 2020, 3:41 PM

This is done and in production.

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:41 PM