Page MenuHomePhabricator

Add API (option) to format Wikidata entities as plain text in bulk
Closed, ResolvedPublicFeature

Description

Feature summary (what you would like to be able to do and where):
As a developer using Wikidata data, I would like an API that allows me to display entities using their labels (or lemmas) in plain text.
None of the current options are fully adequate:

  • wbgetentities with props=labels, languages= + languagefallback= comes close, but is inefficient (it always loads the full entity data, rather than using the term store) and requires users to implement their own handling for entities without labels (such as lexemes)
  • wbformatvalue supports all entity types and several output formats, including plain text, but can only format one value at a time
  • wbformatentities (T207484) can format up to 50 entities at a time, but only supports HTML output
  • the REST API, as far as I can tell, can only process one entity at a time, doesn’t support lexemes, and has separate URL paths for different entity types
  • the Wikidata Query Service supports even more than 50 entities at a time, but its efficiency is a mixed bag, it’s already overloaded, language fallback requires more work (list all fallback languages explicitly), and a solution based on WDQS still requires custom handling for entities without labels (such as lexemes)

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Tool-wdactle is one project that needs this. Currently, I use wbformatentities and parse the returned HTML to strip it back down to plain text, but this is inefficient and unsatisfying.

Benefits (why should this be implemented?):
More developers could use an API offering that puts less strain on server resources.

Event Timeline

Change #1143558 had a related patch set uploaded (by Lucas Werkmeister; author: Lucas Werkmeister):

[mediawiki/extensions/Wikibase@master] repo: Add generate= parameter to wbformatentities API

https://gerrit.wikimedia.org/r/1143558

Change #1143558 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] repo: Add generate= parameter to wbformatentities API

https://gerrit.wikimedia.org/r/1143558

Mentioned in SAL (#wikimedia-cloud) [2025-05-23T16:47:55Z] <wmbot~lucaswerkmeister@tools-bastion-13> deployed df1752303e (use action=wbformatentities with generate=text/plain, cc T393691)

LucasWerkmeister claimed this task.

The commit mentioned above seems to work like a charm \o/ I haven’t benchmarked the API rigorously, but it looks like it’s slightly more efficient both server-side (formatting plain text is less work than formatting HTML) and client-side (I no longer have to feed the HTML to a DOMParser to extract the label with a querySelector).

I think we can go ahead and close this.