Page MenuHomePhabricator

Get metadata from PDF links in citoid
Open, HighPublic

Description

Currently citoid cannot retrieve metadata from a link to a pdf.

Zotero does this, but there's no support for the feature in translation-server: https://www.zotero.org/support/retrieve_pdf_metadata

Example request that fails: http://query.nytimes.com/mem/archive-free/pdf?res=FB0610FF3F5912738DDDAC0A94D8415B858CF1D3

There is an open bug for this in translation-server here: https://github.com/zotero/translation-server/issues/70

Related Objects

Event Timeline

Mvolz created this task.Jun 1 2016, 6:27 PM
Restricted Application added a project: VisualEditor. · View Herald TranscriptJun 1 2016, 6:27 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
Jdforrester-WMF renamed this task from Get metadata from pdf links in citoid to Get metadata from PDF links in citoid.Jul 19 2016, 7:12 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF moved this task from To Triage to TR0: Interrupt on the VisualEditor board.
Mvolz moved this task from Backlog to IO Tasks on the Citoid board.Jul 29 2016, 3:02 PM
Mvolz updated the task description. (Show Details)Jan 12 2017, 9:47 AM
czar awarded a token.Mar 12 2017, 10:18 AM
Mvolz updated the task description. (Show Details)Dec 17 2019, 11:01 AM
Mvolz updated the task description. (Show Details)Dec 18 2019, 9:54 AM
Restricted Application added a subscriber: RhinosF1. · View Herald TranscriptDec 18 2019, 9:54 AM

FYI, a notable software library to extract metadata from PDFs is grobid: https://github.com/kermitt2/grobid

Mvolz added a comment.Jul 19 2020, 8:54 AM

It looks like Zotero was pretty close to supporting at least some of these, but the PR is still open: https://github.com/zotero/translation-server/pull/59

Mvolz raised the priority of this task from Medium to High.Jul 19 2020, 8:59 AM
Mvolz updated the task description. (Show Details)