In the original grant application for Structured Data on Commons (SDC) support for OpenRefine, we wrote about the SDC reconciliation service:
[the reconciliation service] allows OpenRefine (and tools outside of OpenRefine) to take a list of file names from Wikimedia Commons and to convert these file names to their corresponding entity identifiers (“M numbers” or M-ids - the Wikimedia Commons equivalent of Q-ids). These M-ids are needed to perform further SDC operations.
Ideally, the Commons reconciliation service recognizes and reconciles the most commonly file name notation formats (with or without File: prefix, with underscores vs spaces in file names) that are produced as exports from the most widely used tools (PetScan, the Wikidata and Wikimedia Commons Query Services, ...).
done? | what? | filename written as (example) | example export file |
[x] | PetScan's CSV, TSV and JSON output (query) | Badende_vogel_bij_roze_bloem_Bloemen-_en_vogelschetsen_van_Keinen_(serietitel)_Keinen_kacho_gafu_(serietitel_op_object),_RP-P-2004-508D-9.jpg | |
[ ] | PetScan's Plain text output (query) | File:Badende vogel bij roze bloem Bloemen- en vogelschetsen van Keinen (serietitel) Keinen kacho gafu (serietitel op object), RP-P-2004-508D-9.jpg | |
[ ] | The Wikidata Query Service's output for Commons filenames (query) | https://commons.wikimedia.org/wiki/Special:FilePath/Mosaics%20%281953%29%20by%20Nel%20Klaassen%2C%20Peek%20%26%20Cloppenburg%20building%2C%20Hoogstraat%20%2850979423667%29.jpg | |
[x] | The Wikimedia Commons Query Service output - entity URIs (query) | https://commons.wikimedia.org/entity/M93645431 | |
[x] | Simple URLs of Commons file pages | https://commons.wikimedia.org/wiki/File:Mosaics_(1953)_by_Nel_Klaassen,_Peek_&_Cloppenburg_building,_Hoogstraat_(50979423667).jpg | |
Additionally, let's discuss and decide whether we indeed want to only allow a list of file names as input, or whether we want to provide more flexible options to end users. Categories, for instance: T290089: Structured Data on Commons reconciliation service accepts Commons category names as input