Page MenuHomePhabricator

Batch-OCR museum labels pictures to generate OpenRefine-ready files
Closed, DeclinedPublic

Description

Use case/problem: when photographing exhibitions, I take pictures of the works + the museum labels (cartels, in French, nothing to do with drug trafficking). When I want to import the pictures of the works and create the linked Wikidata items, it’s tedious to read the info from the labels pictures and write them into Wikidata manually.

Idea: run an OCR tool on the folder to retrieve the info from the labels pictures and have them as ready to import as possible (through OpenRefine or QuickStatements or whatever mass import tool).

Current status: did some tests with Tesseract, some code is here

Event Timeline

@Sukkoria It would be interesting to see a demo in Arnhem.

@Sukkoria: Thanks for participating in the Hackathon! We hope you had a great time.

  • If this task was being worked on and resolved at the Hackathon: Please change the task status to resolved via the Add Action...Change Status dropdown, and make sure that this task has a link to the public codebase.
  • If this task is still valid and should stay open: Please add another active project tag to this task, so others can find this task (as likely nobody in the future will look back at the Hackathon workboard when trying to find something they are interested in).
  • In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to declined.

Thank you,
Phabricator housekeeping service

No reply; setting task status to declined.