Develop a reconciliation service for structured data on Wikimedia Commons (SDC / StructuredDataOnCommons) which, just like the Wikidata reconciliation service, conforms to the Reconciliation Service API / protocol. This reconciliation service will be used by OpenRefine and can also be used by other tools.
This will be developed as part of the Structured Data on Commons Functionalities in OpenRefine project. Development work is planned ~ September-November 2021. Also see T289971: [epic] Add Structured data on Wikimedia Commons support to OpenRefine.
Functionalities to include:
- Take a list of file names from Wikimedia Commons and convert these file names to their corresponding entity identifiers (“M numbers” or M-ids - the Wikimedia Commons equivalent of Q-ids)
- Provide a data extension service that fetches
- values of requested properties of the media files
- wikitext of the media files (preferably parsed in a semi-structured way - interestingly, this is NOT structured data, but is necessary to allow further data operations in OpenRefine)
- categories of the media files (ditto) (via Wikitext)
- the existing structured data statements (if any exist) of the MediaInfo entity
- the existing captions of the MediaInfo entity