Page MenuHomePhabricator

Improve Structured Data on Commons data models (and their documentation) as a basis for Lua Infobox templates
Open, Needs TriagePublic

Description

Since 2019 it is possible to add structured data to files on Wikimedia Commons. The Wikimedia Commons community has added structured data to many millions of files on Commons, using varying data models and conventions (i.e. ways to describe specific types of files, such as photos of buildings, of artworks, full scans of books, individual digitized illustrations from books...).

Some data modeling conventions are documented at https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling

Batch upload tools for Wikimedia Commons that add structured data, such as OpenRefine, can make uploading files easier by providing end users with widely adopted templates/schemas, presenting a selection of simple 'forms' with predefined fields which the end users need to fill in (e.g. the file's source, copyright status, creator).

Before the deployment of structured data on Commons (SDC), files were only described with plain text (Wikitext). With the arrival of SDC, there is now very often duplication of information in both Wikitext and structured data, with the risk of both 'buckets' of data and information going out of sync. In the past year, various Commons community members have worked on fully Lua-driven infobox templates for Wikimedia Commons that fully draw their data from SDC. For batch upload tools, it is also helpful to adopt exactly these (fully SDC-driven) infobox templates, as it simplifies the upload experience for the user and indeed avoids duplication. But more work is needed on both documentation, and further development of such templates.

Tasks for Wikimedia-Hackathon-2023 (and later) include:

  • Clean up https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling, with better indication of those broadly adopted templates that are suitable for widespread use
  • Create better on-wiki documentation for the fully Lua-driven information templates ("minimal Wikitext templates"?) which are broadly applicable / usable by many Commons users and by batch upload tools
  • Create and improve data modeling guidelines for those aspects of templates that are not well developed yet (e.g. copyright and licenses) and for types of files that don't have a "minimal Wikitext template" yet (examples: Specimen, creative works without Wikidata item)

This project is started on the occasion of the Wikimedia-Hackathon-2023 but will probably be continued after the hackathon as well.

Etherpad with more info, basic principles, tasks: https://etherpad.wikimedia.org/p/SDC-modeling-hackathon-2023

Event Timeline

This comment was removed by Spinster.

I will be joining the hackathon remotely on behalf of the Wikidocumentaries project. T329023

We are working on an MVP of media uploads with Wikidocumentaries from a third party repository (the case study will be done with finna.fi) to SDC. At the hackathon, I will be looking into mapping metadata between the repositories, Wikidocumentaries and SDC and testing tooling around that.

Some specs

  • We will focus on freely licensed images (eg. excluding maps, contemporary art, books, data items, moving image, 3D etc.) to keep the task simple.
  • We will make an attempt to utilize keywords that have been done with a controlled vocabulary (YSO) later during the project.
  • We work with the MVP, but are interested in more complex roundtripping scenarios.
  • We would be interested in a collaboration around API mapping. I will be testing if Postman can be a suitable tool.
  • We wish to adopt a commonly used wikitext template
  • Most of the media files are about creative works without a Wikidata item.
  • Most of the media files are about creative works without a Wikidata item.

Great minds think alike, I was just about to create a Phabricator ticket to figure out data modeling for this use case (a data model for this hasn't solidified yet).

Here it is: T337048: Discuss and agree on an SDC data modeling convention for the case where a Commons file depicts a creative work without Wikidata item

Thanks for participating in the Hackathon! We hope you had a great time.

  • If this task was being worked on and resolved at the Hackathon: Please change the task status to resolved via the Add Action...Change Status dropdown, and make sure that this task has a link to the public codebase.
  • If this task is still valid and should stay open: Please add another active project tag to this task, so others can find this task (as likely nobody in the future will look back at the Hackathon workboard when trying to find something they are interested in).
  • In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to declined.

Thank you,
Phabricator housekeeping service