Page MenuHomePhabricator

[investigation] Add parsed main image to article response (timebox ~ 2 day(s))
Closed, ResolvedPublic5 Estimated Story Points

Description

Investigate what it would take to add image (string URL pointing to commons) to the article response. Go through Summary endpoint to evaluate its solution.

Acceptance Criteria

  1. Thumbnail and main image added to the schema based on the schema.org criteria.
  2. Investigate the efforts in replicating the code or writing new code from the summary endpoint.
  3. Create a new ticket(s) with implementation details on how this task needs to be completed.

Notes
Link to the summary code.
Link to the metadata collection looks like this is where API call for image happening.

Event Timeline

HShaikh triaged this task as High priority.Mar 1 2023, 2:51 PM
HShaikh updated the task description. (Show Details)
HShaikh renamed this task from [investigation] Add parsed main image to article response to [investigation] Add parsed main image to article response (timebox ~ day(s)) .Mar 9 2023, 2:19 PM
HShaikh renamed this task from [investigation] Add parsed main image to article response (timebox ~ day(s)) to [investigation] Add parsed main image to article response (timebox ~ 2 day(s)) .Mar 9 2023, 2:30 PM
Felixejofre set the point value for this task to 5.

According to the investigation we can use Schema.org representation of the image for an Article: https://schema.org/Article. An image can be represented as an ImageObject which includes content url (original image url), original image width and height and thumbnail which, in turn, also is represented as an ImageObject. Json schema representation looks like this:

{

 "$schema": "https://json-schema.org/draft/2020-12/schema",

 "$id": "image.json",

 "title": "Image",

 "description": "Representation of the image entity",

 "type": "object",

 "properties": {

     "contentUrl": {

         "type": "string"

     },

     "width": {

         "type": "number"

     },

     "height": {

       "type": "number"

     },

     "thumbnail": {

         "$ref": "file:image.json"

     }

 }

}

AVRO Schema

package schema


import "github.com/hamba/avro"



// ConfigImageObject schema configuration for ImageObject.

var ConfigImage = &Config{

	Type: ConfigTypeValue,

	Name: "Image",

	Schema: `{

		"type": "record",

		"name": "Image",

		"namespace": "wikimedia_enterprise.general.schema",

		"fields": [

			{

				"name": "contentUrl",

				"type": "string"

			},

			{

				"name": "thumbmail",

				"type": [

					"null",

					"Image"

				]

			}

			{

				"name": "width",

				"type": "int"

			},

			{

				"name": "height",

				"type": "int"

			},

		]

	}`,

	Reflection: Image{},

}



// NewImageSchema creates new article image avro schema.

func NewImageSchema() (avro.Schema, error) {

	return New(ConfigImage)

}



// Image schema for article image.

// Compliant with https://schema.org/ImageObject,

type Image struct {

	ContentUrl string `json:"contentUrl,omitempty" avro:"contentUrl"`

	Thumbnail  *Image `json:"thumbnail,omitempty" avro:"thumbnail"`

	Width      int    `json:"width,omitempty" avro:"width"`

	Height     int    `json:"height,omitempty" avro:"height"`

}

Source of the data (Wikimedia Action API)

According to the Wikimedia Actions API docs, both original image and thumbnail can be retrieved by adding a particular parameter to a request, specifically by adding pageimages parameter to the API request:

props=pageimages

In order to get both the original image and thumbnail, the request parameters should be extended with piprop parameter:

piprop=thumbnail|original

By default Action API returns a thumbnail scaled by default size (which differs from page to page). If we need a specific size of the thumbnail, it also can be achieved by adding a parameter to the API request:

pithumbsize=500 (Value “500” corresponds to width of the thumbnail.)

Example request (Click to open in a browser):

GET https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original|thumbnail&pithumbsize=300&titles=Albert%20Einstein&pilicense=any

Example Response:

{
 batchcomplete: true,
 query: {
   pages: [
     {
       pageid: 736,
       ns: 0,
       title: "Albert Einstein",
       thumbnail: {
         source: "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/300px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
         width: 300,
         height: 394
       },
       original: {
         source: "https://upload.wikimedia.org/wikipedia/commons/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
         width: 2523,
         height: 3313
       },
       terms: {
         alias: [
           "Einstein",
           "A. Einstein"
         ],
         label: [
           "Albert Einstein"
         ],
         description: [
           "German-born theoretical physicist; developer of the theory of relativity (1879–1955)"
         ]
       }
     }
   ]
 }
}

Implementation ticket: https://phabricator.wikimedia.org/T333145