Page MenuHomePhabricator

Create a reconciliation service for (Structured Data on) Wikimedia Commons
Closed, ResolvedPublic

Description

Develop a reconciliation service for structured data on Wikimedia Commons (SDC / StructuredDataOnCommons) which, just like the Wikidata reconciliation service, conforms to the Reconciliation Service API / protocol. This reconciliation service will be used by OpenRefine and can also be used by other tools.

This will be developed as part of the Structured Data on Commons Functionalities in OpenRefine project. Development work is planned ~ September-November 2021. Also see T289971: [epic] Add Structured data on Wikimedia Commons support to OpenRefine.

Functionalities to include:

  • Take a list of file names from Wikimedia Commons and convert these file names to their corresponding entity identifiers (“M numbers” or M-ids - the Wikimedia Commons equivalent of Q-ids)
  • Provide a data extension service that fetches
    • values of requested properties of the media files
    • wikitext of the media files (preferably parsed in a semi-structured way - interestingly, this is NOT structured data, but is necessary to allow further data operations in OpenRefine)
    • categories of the media files (ditto) (via Wikitext)
    • the existing structured data statements (if any exist) of the MediaInfo entity
    • the existing captions of the MediaInfo entity

Event Timeline

Spinster triaged this task as Medium priority.Aug 26 2021, 6:09 PM
Spinster set Due Date to Oct 28 2021, 10:00 PM.
Spinster updated the task description. (Show Details)
Spinster updated the task description. (Show Details)

If we want to host this project on GitHub, it could be done here: https://github.com/OpenRefine/commons-recon-service
(But we could also use Wikimedia's Gerrit).

If we want to host this project on GitHub, it could be done here: https://github.com/OpenRefine/commons-recon-service
(But we could also use Wikimedia's Gerrit).

In today's team kickoff meeting, we decided we'll host it on Gerrit.

@Spinster / @Eugene233: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

@Spinster / @Eugene233: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

I removed the due date :-) We are quite far on this task, still ironing out some bugs.

Should we close this as done?

@Pintoch We still need some cleaning up to be done IMO. Should be creating a few tasks around this.

Sounds good! I think @Spinster also intends to write a few follow-up tickets.

Spinster updated the task description. (Show Details)

This main task is totally done, so I will close it! Should have done this months ago already. Like with every other tool, there will always be lingering and new things to fix, but the basics are totally in place 💃