Page MenuHomePhabricator

Reader gets page thumbnail with search results
Closed, ResolvedPublic3 Estimated Story Points

Description

"As a Reader, I want to see a thumbnail image of each page in a search result set, so I can visually identify the topic of the page."

To do this, we can add a thumbnail element to the pages in the search results, using the FileRepresentation object schema (see https://www.mediawiki.org/wiki/Core_Platform_Team/Initiatives/Core_REST_API_in_MediaWiki/Schema for object types in the REST API so far).

MediaWiki core doesn't have a representation of a thumbnail image. The default implementation should return null for the thumbnail, but allow for extensions to add a thumbnail (for example, using a hook).

The thumbnail size should be the biggest thumbnail that can fit in a 200x200px square.

The PageImages extension us using the ApiOpenSearchSuggest hook to attach thumbnails to search results returned by ApiOpenSearch.

Related Objects

Event Timeline

Feedback from @Anomie :

would probably need a hook to allow the PageImages extension to add the data to the result. Or else enhancement of CirrusSearch to obtain and store the PageImages data and enhancement of the result interface to be able to expose it. Either way, IMO we should also add the same to the Action API.

you can probably get it from the Action API already with generator=search&prop=pageimages. But the Action API doesn't like giving the search metadata when used as a generator, so you'd have to do list=search&generator=search and hope they don't wind up returning slightly different result sets. I think at least one of the mobile apps already does this, or did at one point.

Note: to get the data for both this task and T245674 from the Action API, you'd use generator=search&prop=pageimages|extracts

Info on the PageImages extension is here:
https://www.mediawiki.org/wiki/Extension:PageImages

From that page:

The PageImages extension provides image information by adding a prop=pageimages to the properties API for action=query

For reference, here's an example from Action API against enwiki:

This is a normal search. There are no thumbnails in the results:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=Craig%20Noone

This is a search for the same string, but using a generator:
https://en.wikipedia.org/w/api.php?action=query&format=json&gsrsearch=Craig%20Noone&generator=search&prop=pageimages
Notice that these results include a "thumbnail"

This includes both "list=search" and "generator=search" as suggested by @Anomie
https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=Craig%20Noone&gsrsearch=Craig%20Noone&generator=search&prop=pageimages
Notice that these results include both a "pages" and a "search" block under "query".

When the thumbnails were included for apps and page previews, we used the following format:

"thumbnail": {
  "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Mark_I_series_tank.jpg/320px-Mark_I_series_tank.jpg",
  "width": 320,
  "height": 194
},
"originalimage": {
  "source": "https://upload.wikimedia.org/wikipedia/commons/b/ba/Mark_I_series_tank.jpg",
  "width": 5376,
  "height": 3264
},

The problem with thumbnails is always which size to return. We don't really have a clear thumnailing API, so in case we didn't guestimate the one-size-fits-all in the result, clients have to guess how to restructure the URI to get a bigger or smaller picture. This comes with problems if you don't know the original image size. I don't specifically remember what the problems are, but I can ask around and find out if you're curious.

So, I have some proposals/questions below:

  1. We probably should include the dimensions into the response. This doesn't cost a lot of traffic, but can provide a lot of benefit to the client.
  2. We should really think about what size thumbnail to return. Given that this is a search endpoint, I would guess.. small? But for retina displays, you'd need double-small if you want it to look nice and crisp. @EvanProdromou should probably describe the issue to the web team and expand on the requirements.
  3. We should think about how this endpoint plays with the media info endpoint. We did have a nice pattern of HATEAOS-style links, so we could think about including a link to the file information endpoint if the client wants to get more details on it.
  4. We should think about including original dimensions of the image, or at least leave clear space to add them later. I would assume that frontend would want to do some image scaling anyway, and knowing the original size is crucial there - you can scale image to be bigger then it originally was.

We don't really have a clear thumnailing API, so in case we didn't guestimate the one-size-fits-all in the result, clients have to guess how to restructure the URI to get a bigger or smaller picture.

See also T66214: Define an official thumb API. There are several use cases that could use one, instead of trying to build it into every API endpoint that links to an image.

WDoranWMF set the point value for this task to 3.Feb 26 2020, 8:09 PM

This enhancement already exists in the action API, the PageImages extension does this kind of thing for ApiOpenSearch via the ApiOpenSearchSuggest hook. It would be nice to share code, instead of implementing the same thing twice. There are several possible strategies for this.

We could try to use the same hook, but we'd have to mangle the input and output structure.
Or we create a new hook that could be used in both APIs.
Or we could put a new hook into the application logic that is used by both APIs. That is the nicest option, but requires more design and though. We'd probably end up extending the SearchSuggestion class (and maybe also SearchResult), and putting a hook into SearchEngine::processCompletionResults. There is already SearchResultsAugment hook, but it's not for suggestions, and does not explicitly model thumbnails....

Basically, before implementation on this starts, I'd like to see a rough outline of how this is supposed to work, with the aspect of consistency and code sharing between the action API and the REST API at least considered.

I updated this with a preferred size: 512x512; or, rather, 512 on one side, but preferably square.

I updated this with a preferred size: 512x512; or, rather, 512 on one side, but preferably square.

Could you please say where 512 number came from?

When the images are uploaded, we pre-render a number of thumbnails, controlled by a setting

// Thumbnail prerendering at upload time
'wgUploadThumbnailRenderMap' => [
	'default' => [ 320, 640, 800, 1024, 1280, 1920 ],
	'private' => [],
],

that are then used throughout the system. It would be nice to have the thumbnails retuned here match one of the prerendered sizes. If there is a serious reason to select not prep rendered size, that size has to be added to prerendered list. But please be aware, expanding the list implies significant resource allocation for rendering.

@daniel

First, don't think that extending SearchEngine with hook is a good idea right now:

  1. SearchEngine is quite complicated and I'm afraid of getting unexpected side effects there.
  2. Different strategies for searching are used. Say SearchHandler combines titles and texts into one array, like below:
	$titleSearch = $searchEngine->searchTitle( $query );
	$textSearch = $searchEngine->searchText( $query );
	$titleSearchResults = $this->getSearchResultsOrThrow( $titleSearch );
	$textSearchResults = $this->getSearchResultsOrThrow( $textSearch );

	$mergedResults = array_merge( $titleSearchResults, $textSearchResults );
	return $this->buildOutputFromSearchResults( $mergedResults );
  1. We have to take into consideration the fact that searchText is used in many places

https://codesearch.wmflabs.org/search/?q=-%3EsearchText%5C(&i=nope&files=&repos=

So we'll need to extend a class with functionality that is needed for a few of its clients only. Moreover, having this very generic hook we don't control changes it might perform. Say, unclear how to make sure that they add 'thumbnail' and 'description' fields.

I offer the following instead:

  1. Add additional configuration for SearchEngine that introduce additional possible fields. like $additioanFields = ['description', 'thumbnail'].
  2. Dynamically introduce hooks with names like 'RestSearchResultAugmentDescription', 'RestSearchResultAugmentThumbnail'.
  3. Use the naming convention above to be possible to use
$results = $this->doSearch(...);
...
foreach( $additioanFields as $field ) {
	$pageIds = $this->getEmptyIds(); // creates an array of [ pageId => null ]
	Hooks::run( 'RestSearchResultAugment' . ucfirst($field), [ &$pageIds ] ); // Proccess hooks. Note that extension will handle hook for the specific field olnly
	$this->agmentResults($results, $field, $fieldIds); // If nobody handles the hood we simply add $field => null, so it's present in result anyway. 
}
  1. Implement given hooks in certain extension (TextExtras, PageImage). They're pretty similar to how extensions work now so it will be possible to resue the code without significant dangarous changes.

@daniel

First, don't think that extending SearchEngine with hook is a good idea right now:

  1. SearchEngine is quite complicated and I'm afraid of getting unexpected side effects there.

Yes, it's complicated and a bit scary. I think that putting the hooks there would be the right thing in theory, and I'd like to explore it a bit more. You are probably right that it's too complicated for now, but I have not yet given up on the idea.

  1. Different strategies for searching are used. Say SearchHandler combines titles and texts into one array, like below:

searchTitle() and searchText() both use augmentSearchResults() internally, which is indeed intended for this kind of thing. However, completionSearch() doesn't use augmentSearchResults() I think. I have no idea why.

  1. We have to take into consideration the fact that searchText is used in many places

SearchEngine can be controlled by calling setFeatureData(). I think that would the correct way to turn parts of the search results on and off. I may be wrong though, I have only looked at the code very briefly.

So we'll need to extend a class with functionality that is needed for a few of its clients only.

Since we are not hard-coding it, but only providing a hook point for injecting the functionality, this doesn't seem so terrible.

Moreover, having this very generic hook we don't control changes it might perform. Say, unclear how to make sure that they add 'thumbnail' and 'description' fields.

Yes, it would be good to have more specific hooks, or at least to define e.g. how setFeatureData() controls what hooks can do.

I offer the following instead:

  1. Add additional configuration for SearchEngine that introduce additional possible fields. like $additioanFields = ['description', 'thumbnail'].
  2. Dynamically introduce hooks with names like 'RestSearchResultAugmentDescription', 'RestSearchResultAugmentThumbnail'.

If the hooks are only for the REST api, why do we need configuration settings for SearchEngine? Why does SearchEngine have to know about this at all? The REST api itself knows exactly what additional fields it wants to allow, this doesn't need to be configurable.

  1. Use the naming convention above to be possible to use
	Hooks::run( 'RestSearchResultAugment' . ucfirst($field), [ &$fieldIds ] ); // Proccess hooks. Note that extension will handle hook for the specific field olnly
	$this->agmentResults($results, $field, $fieldIds); // If nobody handles the hood we simply add $field => null, so it's present in result anyway. 

The hooks needs some data to operate on. It needs a list of titles as input, and needs to return a list of... well, of what? Arbitrary array structures?

  1. Implement given hooks in certain extension (TextExtras, PageImage). They're pretty similar to how extensions work now so it will be possible to resue the code without significant dangarous changes.

This sounds like we would duplicate the hook handler code in the extensions, not re-use it.

@EvanProdromou this ticket asks for the thumbnails in the search result to follow the FileRepresentation spec. I don't see this spec used anywhere in the REST API yet, and I have some questions about it:

  • "mediatype": MediaWiki distinguishes between MediaType and MimeType. MediaType is something like "VIDEO" or "ARCHIVE" or "DRAWING" or "BITMAP", see the MEDIATYPE_XXX constants in defines.php. I assume you want the mime type here? To avoid confusion with the usage of the "mediatype" key in the action Api, I suggest to use a different name here, e.g. "mime" or "mimetype" or "contenttype" or something.
  • "size": Is this a hard requirement? Is it really needed? It's expensive to do, we can't easily know this in advance for a thumbnail. The scaled file may be generated by a 404 handler on the fly only after the client tries to read it.

Also, it would be nice if the thumbnail would provide an easy way to find the full version of the file, and meta-data about it. I suppose it should be a link to the /file/{title} endpoint. The key could be "fileinfo" perhaps.

The FileRepresentation spec calls for 'size' property. However, sine we're doing the thumbnailg lazily it is impossible to know at the time of returning the response. Switching to non-lazy generation of thumbnails just to know the size is not acceptable from the performance perspective. We will be dropping the 'size' key from the response.

@Pchelolo please leave the size property and set it to null. Per the design principles, "Empty properties should have the value "null". If a property has no value, we should include the property name in the JSON output, with the value null."

@daniel I like the idea of being able to get the File easily. I added an optional title property to the FileRepresentation in the schema.

Validated in production.