Since 2019 it is possible to add structured data to files on Wikimedia Commons. The Wikimedia Commons community has added structured data to many millions of files on Commons, using varying data models and conventions (i.e. ways to describe specific types of files, such as photos of buildings, of artworks, full scans of books, individual digitized illustrations from books...).
Some data modeling conventions are documented at https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling
Batch upload tools for Wikimedia Commons that add structured data, such as OpenRefine, can make uploading files easier by providing end users with widely adopted templates/schemas, presenting a selection of simple 'forms' with predefined fields which the end users need to fill in (e.g. the file's source, copyright status, creator).
Before the deployment of structured data on Commons (SDC), files were only described with plain text (Wikitext). With the arrival of SDC, there is now very often duplication of information in both Wikitext and structured data, with the risk of both 'buckets' of data and information going out of sync. In the past year, various Commons community members have worked on fully Lua-driven infobox templates for Wikimedia Commons that fully draw their data from SDC. For batch upload tools, it is also helpful to adopt exactly these (fully SDC-driven) infobox templates, as it simplifies the upload experience for the user and indeed avoids duplication. But more work is needed on both documentation, and further development of such templates.
Tasks for Wikimedia-Hackathon-2023 (and later) include:
- Clean up https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling, with better indication of those broadly adopted templates that are suitable for widespread use
- Create better on-wiki documentation for the fully Lua-driven information templates ("minimal Wikitext templates"?) which are broadly applicable / usable by many Commons users and by batch upload tools
- Create and improve data modeling guidelines for those aspects of templates that are not well developed yet (e.g. copyright and licenses) and for types of files that don't have a "minimal Wikitext template" yet (examples: Specimen, creative works without Wikidata item)
This project is started on the occasion of the Wikimedia-Hackathon-2023 but will probably be continued after the hackathon as well.
Etherpad with more info, basic principles, tasks: https://etherpad.wikimedia.org/p/SDC-modeling-hackathon-2023