Page MenuHomePhabricator

Provide an API for accessing alt text (and possibly other structured data) stored with MediaWiki file uploads
Open, Needs TriagePublic

Description

Wikimedia Commons stores alt text in the form of WikibaseMediaInfo claims (see T166094: Allow editors to provide default alt text on Wikimedia Commons file description pages). We don't want clients to access that directly because

  1. the Wikibase APIs are complex and not that developer-friendly;
  2. we want to provide stability guarantees for APIs and community-owned ontology makes that difficult;
  3. as much as possible we want to hide ontology changes and keep the API the same (e.g. if a migration happens because of T325944: Use a multilingual Wikibase property for storing alt text on Commons);
  4. we want to make alt-text features available on developer setups and third-party MediaWiki setups without forcing a Wikibase / WikibaseMediaInfo dependency.

So, there should be an internal PHP API (presumably a core hook or service, maybe along the lines of the GetExtendedMetadata hook) that WikibaseMediaInfo implements by looking up the relevant Wikibase property, but other extensions can implement in different ways; and there should probably also be a web API exposing the same functionality. Probably even two web APIs, one in the action API for batching (this probably fits into the imageinfo API), and ont in the REST API for caching (maybe fits into the /file/{title} endpoint?).

There will probably be plenty of other similar use cases (e.g. get information about the license of the file) so probably we want an extensible mechanism where it's easy to add more kinds of data later. Eventually it should probably replace GetExtendedMetadata.

Some other considerations:

  • there should be a way to get the alt text for a specific file in a specific language (probably with language fallback?)
  • there should be a way to get the alt text for all files on a page and/or a batch of images (for the InstantCommons use case)
  • is there a use case for getting all languages for a file?
  • the web API should probably be cacheable for a short time

Event Timeline

Tgr renamed this task from Provide an API for accessing alt text stored with MediaWiki file uploads to Provide an API for accessing alt text (and possibly other structured data) stored with MediaWiki file uploads.Dec 27 2022, 1:11 AM
Tgr updated the task description. (Show Details)

Because of T325949: Allow access to (some) structured image metadata across wikis, there are two orthogonal directions of abstraction here:

  • how the alt text is stored (Wikibase property, or inside wikitext, or something else)
  • whether it is stored on this wiki, or another wiki in the same database cluster, or another wiki only accessible via API.

Because of that, we probably want two layers:

  • The aforementioned hook or service, for getting alt text for a file in the local DB cluster;
  • a File method for getting the data, which would invoke the service for a LocalFile, invoke the web API for a ForeignAPIFile, and handle ForeignDBFile somehow. This one is less clear but there are multiple possible strategies:
    • Just call the API, add a WAN cache layer on top. (Not great, but more or less how it's done for GetExtendedMetadata, except the "API" there is the file description page itself, so it gets purged from edge cacges on edit and can be more aggressively cached because of that).
    • Make the service work with an arbitrary database which might not be the DB for the current wiki (a long-term architectural goal for much of MediaWiki, but hard because configuration differences between the wikis need to be accounted for).
    • Put the data in some dedicated place in the local wiki (page_props?) and do a cross-wiki read for that.