Page MenuHomePhabricator

Attempt to get original source URL from archive.org URL or other metadata
Closed, DuplicatePublic8 Estimated Story Points

Description

When given an archive.org URL, it'd be nice if citoid attempted to extract the original source URL and put the archive.org URL in the archive URL parameter.

[Presumably total wishlist/long term.]

Event Timeline

LuisVilla raised the priority of this task from to Needs Triage.
LuisVilla updated the task description. (Show Details)
LuisVilla added a project: Citoid.
LuisVilla subscribed.

archive.org support Memento (RFC 7089), so if Citeoid can do a HEAD request on the URL, rich metadata can be obtained from the headers.
I dont know if there is a Memento client in Node.js, but example code can be found in https://github.com/mementoweb/py-memento-client and probably https://github.com/mementoweb/mediawiki .

Thanks @jayvb! If there's no suitable Node library we can build one...
We've already done some of that already with different types of metadata (
see http://github.com/wikimedia/html-metadata )

Ocaasi added a subscriber: Jdforrester-WMF.
Ocaasi subscribed.
LuisVilla renamed this task from Attempt to parse archive.org URLs? to Attempt to get original source URL from archive.org URL or other metadata.Mar 27 2016, 6:15 AM
Jdforrester-WMF set the point value for this task to 8.
Jdforrester-WMF moved this task from To Triage to Freezer on the VisualEditor board.