Page MenuHomePhabricator

Get metadata from PDF links in citoid
Open, Stalled, HighPublic

Description

Currently citoid cannot retrieve metadata from a link to a pdf.

Zotero does this, but there's no support for the feature in translation-server: https://www.zotero.org/support/retrieve_pdf_metadata

Example request that fails: http://query.nytimes.com/mem/archive-free/pdf?res=FB0610FF3F5912738DDDAC0A94D8415B858CF1D3

There is an open bug for this in translation-server here: https://github.com/zotero/translation-server/issues/70

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
Jdforrester-WMF renamed this task from Get metadata from pdf links in citoid to Get metadata from PDF links in citoid.Jul 19 2016, 7:12 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF moved this task from To Triage to TR0: Interrupt on the VisualEditor board.

FYI, a notable software library to extract metadata from PDFs is grobid: https://github.com/kermitt2/grobid

It looks like Zotero was pretty close to supporting at least some of these, but the PR is still open: https://github.com/zotero/translation-server/pull/59

Mvolz raised the priority of this task from Medium to High.Jul 19 2020, 8:59 AM
Mvolz updated the task description. (Show Details)
Mvolz changed the task status from Open to Stalled.Oct 24 2023, 1:21 PM