In the [[ https://meta.wikimedia.org/wiki/Grants:Project/CS%26S/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine | original grant application for Structured Data on Commons (SDC) support for OpenRefine ]], we wrote about the SDC reconciliation service:
> [the reconciliation service] allows OpenRefine (and tools outside of OpenRefine) to take a list of file names from Wikimedia Commons and to convert these file names to their corresponding entity identifiers (“M numbers” or M-ids - the Wikimedia Commons equivalent of Q-ids). These M-ids are needed to perform further SDC operations.
Ideally, the Commons reconciliation service recognizes and reconciles the most commonly file name notation formats (with or without File: prefix, with underscores vs spaces in file names) that are produced as exports from the most widely used tools (PetScan, Pagepile, ...).
| done? | what? | filename written as (example) | example export file |
| [ ] | PetScan's CSV, TSV and JSON output ([[ https://petscan.wmflabs.org/?psid=20435423 | query ]]) | Badende_vogel_bij_roze_bloem_Bloemen-_en_vogelschetsen_van_Keinen_(serietitel)_Keinen_kacho_gafu_(serietitel_op_object),_RP-P-2004-508D-9.jpg | {F34702106} {F34702125} {F34702117} |
| [ ] | PetScan's Plain text output ([[ https://petscan.wmflabs.org/?psid=20435423 | query ]]) | File:Badende vogel bij roze bloem Bloemen- en vogelschetsen van Keinen (serietitel) Keinen kacho gafu (serietitel op object), RP-P-2004-508D-9.jpg | {F34702111} |
Additionally, let's discuss and decide whether we indeed want to only allow a list of file names as input, or whether we want to provide more flexible options to end users. Categories, for instance: {T290089}