Page MenuHomePhabricator

Get metadata from PDF links in citoid
Open, MediumPublic

Description

Currently citoid cannot retrieve metadata from a link to a pdf.

Zotero does this, but there's no support for the feature in translation-server: https://www.zotero.org/support/retrieve_pdf_metadata

Example request that fails: http://query.nytimes.com/mem/archive-free/pdf?res=FB0610FF3F5912738DDDAC0A94D8415B858CF1D3

Cermine can extract metadata from PDFs too: https://github.com/CeON/CERMINE. They do also run it as a publicly accessible service here: http://cermine.ceon.pl/about.html. 

There is an open bug for this in translation-server here: https://github.com/zotero/translation-server/issues/70

Event Timeline

Mvolz created this task.Jun 1 2016, 6:27 PM
Restricted Application added a project: VisualEditor. · View Herald TranscriptJun 1 2016, 6:27 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
Jdforrester-WMF renamed this task from Get metadata from pdf links in citoid to Get metadata from PDF links in citoid.Jul 19 2016, 7:12 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF moved this task from To Triage to TR0: Interrupt on the VisualEditor board.
Mvolz moved this task from Backlog to IO Tasks on the Citoid board.Jul 29 2016, 3:02 PM
Mvolz updated the task description. (Show Details)Jan 12 2017, 9:47 AM
czar awarded a token.Mar 12 2017, 10:18 AM
Mvolz updated the task description. (Show Details)Dec 17 2019, 11:01 AM
Mvolz updated the task description. (Show Details)Dec 18 2019, 9:54 AM
Restricted Application added a subscriber: RhinosF1. · View Herald TranscriptDec 18 2019, 9:54 AM