Page MenuHomePhabricator

API to efficiently format large numbers of entity IDs
Closed, ResolvedPublic

Description

As a Wikidata tool author, I often need to render a lot of entity ID references to the user, in a readable form (e. g. using their labels).

Problem:

Currently, I have two options for this, each with advantages and drawbacks.

  • I can use the wbgetentities API to fetch the relevant information for all the entities I need. The big advantage of this API is that it lets me retrieve information for up to 50 entities at once, which cuts down on a lot of network requests. But all I get is raw entity data, nothing I can directly show to the user: this is bad enough if I’m just interested in items and properties (it means I have to go through the labels and implement language fallbacks myself) and even worse if I want to support all entity types (which means I need to rebuild Wikibase’ logic to render lexemes using their lemmas, forms using their representations, senses using their glosses and their lexeme’s lemmas (did I even download these?), etc.).
  • I can use the wbformatvalue API to render a single datavalue into wikitext or HTML. Here, Wikibase does all the work for me – I just have to deal with the slightly baroque input format (provide a full datavalue instead of a plain entity ID), and I still need to know which entity type the ID refers to in order to provide the datatype argument. But the big problem with this is that I can only render one entity ID per API call: if I want to render 100 entity IDs, I need to make 100 network requests.

I think a combination of these – an API that accepts a list of entity IDs and formats them like entity ID data values – would be useful to a lot of tools.

(wb_terms is another option for server-side Toolforge-based tools, but we’re migrating away from that anyways, see T198866.)

Example:

Tools that I think could benefit from this include:

  • QuickStatements (batch view)
  • Wikidata Graph Builder (graph node labels)
  • Wikidata Recent Changes (“title” of changed pages)
  • Wikidata Vandalism Dashboard (ditto)
  • Wikidata Reconciliation / OpenRefine (reconciliation results)
  • TABernacle (entity ID cells)
  • Wikidata Image Positions (depicted items)
  • possibly some other tools that currently use wb_terms for this, see T197161

Open questions:

  • Should this API also support mass-rendering other types of datavalues (quantities, dates, etc.)? But I don’t see how to accommodate that with a simple API.

Event Timeline

Change 471247 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Add API module to format entity IDs in bulk

https://gerrit.wikimedia.org/r/471247

Change 473266 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Use RemexHtml to make links absolute

https://gerrit.wikimedia.org/r/473266

Change 471247 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add API module to format entity IDs in bulk

https://gerrit.wikimedia.org/r/471247

Change 473266 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Use RemexHtml to make links absolute

https://gerrit.wikimedia.org/r/473266

Looks good to me on beta

image.png (392×1 px, 65 KB)

Moving to needs announcement as this is the sort of thing that could be announced?
Thoughts @Lucas_Werkmeister_WMDE @Lea_Lacroix_WMDE @Lydia_Pintscher

This probably wont be deployed until the week of the 26th of Nov though due to train freezes

I think it would be good to announce this, yes, but I’m not sure which developer-oriented channels we have besides wikidata-tech-l.

Yeah please do a post on wikidata-tech and then add a link to it to the weekly summary.

Pintoch rescinded a token.
Pintoch awarded a token.
Pintoch rescinded a token.
Pintoch awarded a token.
Pintoch subscribed.

@Lucas_Werkmeister_WMDE thank you very much for that!

@Pintoch thanks for reminding me that I still needed to send the announcement ;) closing now.