Page MenuHomePhabricator

Feature request: ability to include labels for related entities in wbgetentities responses
Open, Needs TriagePublic

Description

To add additional structured data to new APIs, the Reading Infrastructure team would like the ability to include labels for related entities included in wbgetentities responses to avoid having to make a second request for that information.

Current behavior:

A request like https://commons.wikimedia.org/w/api.php?action=wbgetentities&languages=en&formatversion=2&ids=M41837276&format=json returns:

{
  "entities": {
    "M41837276": {
      "pageid": 41837276,
      "ns": 6,
      "title": "File:Pluto-01 Stern 03 Pluto Color TXT.jpg",
      "lastrevid": 349086472,
      "modified": "2019-05-07T13:02:06Z",
      "type": "mediainfo",
      "id": "M41837276",
      "labels": {
        "en": {
          "language": "en",
          "value": "High-resolution MVIC image of Pluto in enhanced color to bring out differences in surface composition."
        }
      },
      "descriptions": {},
      "statements": {
        "P180": [
          {
            "mainsnak": {
              "snaktype": "value",
              "property": "P180",
              "hash": "37091d3741ed1d6b19bf23f5947366650883be7d",
              "datavalue": {
                "value": {
                  "entity-type": "item",
                  "numeric-id": 339,
                  "id": "Q339"
                },
                "type": "wikibase-entityid"
              }
            },
            "type": "statement",
            "id": "M41837276$31a303cd-44a6-e116-09ab-fb02d71f2aef",
            "rank": "preferred"
          }
        ]
      }
    }
  },
  "success": 1
}

Desired behavior:

A parameter that would alter the response to include labels of related entities (P180 and Q339 in this example):

{
  "entities": {
    "M41837276": {
      "pageid": 41837276,
      "ns": 6,
      "title": "File:Pluto-01 Stern 03 Pluto Color TXT.jpg",
      "lastrevid": 349086472,
      "modified": "2019-05-07T13:02:06Z",
      "type": "mediainfo",
      "id": "M41837276",
      "labels": {
        "en": {
          "language": "en",
          "value": "High-resolution MVIC image of Pluto in enhanced color to bring out differences in surface composition."
        }
      },
      "descriptions": {},
      "statements": {
        "P180": {
          "labels": {
            "en": {
              "language": "en",
              "value": "depicts"
            }
          },
          "values": [
            {
              "mainsnak": {
                "snaktype": "value",
                "property": "P180",
                "hash": "37091d3741ed1d6b19bf23f5947366650883be7d",
                "datavalue": {
                  "value": {
                    "entity-type": "item",
                    "numeric-id": 339,
                    "id": "Q339",
                    "labels": {
                      "en": {
                        "language": "en",
                        "value": "Pluto"
                      }
                    }
                  },
                  "type": "wikibase-entityid"
                }
              },
              "type": "statement",
              "id": "M41837276$31a303cd-44a6-e116-09ab-fb02d71f2aef",
              "rank": "preferred"
            }
          ]
        }
      }
    }
  },
  "success": 1
}

Open to suggestions about structure of the response or other ways to achieve this.

Event Timeline

JoeWalsh created this task.May 30 2019, 4:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 30 2019, 4:25 PM
JoeWalsh renamed this task from Feature request: ability to include labels for related entities in wbgetentities response to Feature request: ability to include labels for related entities in wbgetentities responses.May 30 2019, 4:26 PM

hey @JoeWalsh

We had a brief discussion about this in Wikidata team. In terms of solutions, another option we thought of, in addition to adding a parameter to wbgetentities, is to introduce another endpoint, say wbgetentitiesfordisplay, that provides textual representations (maybe only labels, maybe more aliases ... etc) for any entities that are part of the response.

We would like to understand the size of the issue and the value of such a feature, esp. that it might not be that cheap to develop and maintain. Would you be able to share with us some use-cases , is the size of the problem it is trying to solve. Useful numbers/details can be:

  • how many use-cases are there where wbgetentities response is used directly to display labels of linked/related entities, requiring a second request to get those labels?
  • how big (nr. of entities) is a typical second request to get labels of linked/related entities?
  • how complicated does the second request make the client dealing with it?
  • is the second request a complete blocker in any of those use-cases?
  • any useful links to existing tasks of work-arounds for or discussions of this issue in other projects?
alaa_wmde claimed this task.Jun 6 2019, 3:06 PM

will take care of communication on this until we have a decision on if/what/when we implement any solution

@alaa_wmde thanks for the response. A separate endpoint like wbgetentitiesfordisplay would also be a good solution, doesn't need to be a parameter on wbgetentities. Added other answers inline below.

how many use-cases are there where wbgetentities response is used directly to display labels of linked/related entities, requiring a second request to get those labels?

The use case is displaying image metadata from the structured data on commons. The image metadata would be displayed in the image gallery view on articles. It would also be displayed in an "edit suggestions" feature that allows users to discover and edit images that need structured data or translations of structured data. I'm not aware of other use cases at the moment, but will provide them here if I come across any more.

how big (nr. of entities) is a typical second request to get labels of linked/related entities?

Currently, it'd be one entity lookup for each statement plus any entities referenced within the statements. These numbers will increase as new statements are added to images for other pieces of metadata, likely settling somewhere in the range of 10-20 entities per item (estimate from @MarkTraceur based on this statistic from wikidata). It's possible that these requests would also be batched for all the commons images in a given article, which would multiply that 10-20 number by the average number of images in an article.

how complicated does the second request make the client dealing with it?

It's not that complicated for the clients to make two requests, but the bigger downside is having to make two network requests that can't be parallelized.

is the second request a complete blocker in any of those use-cases?

It's not a complete blocker, but would slow down results due to the additional network round trip.

any useful links to existing tasks of work-arounds for or discussions of this issue in other projects?

Suggested edit feature that will show the image metadata: T224051
New image metadata screen: T223132
Current middleware request to support these features: T224132

Let me know if you have any other questions. Thanks!

alaa_wmde moved this task from Backlog to Todo: Focus on the User-Alaa board.