Page MenuHomePhabricator

Support remote retrieval of multi-page resources via IIIF
Open, LowPublicFeature

Description

Feature summary (what you would like to be able to do and where):

I would like to be able to create an Index: page on wikisource, that by having an IIIF path field, can utilise an externaly hosted resource

IIIF is an API supported by various sites such as Archive.org/National Library of Scotland and others to procvide high quality images of manuscripts and works they have scanned into digitial archives.

Creating an Index page would be at present, but with the difference that the requests for Page:s would be made to the external host, rather than to a file hosted on Commons.

Benefits (why should this be implemented?):

Reduced file storage and thumbnail concerns on Commons.

Faster access to high quality images on Wikisource.

I am asking for this as a result of a comment made during a disscussion about the 429 outage in thumbnail images recently., where it was suggested that uploading indvidual JPG/TIFF images for a works was a possible alternate to uploading huge PDF or DJVU. Supporting IIIF resources based Index would essentially be doing this, but with the images remaining on the external providers site. Also Some GLAM already have thier own digital galleries, and having this functionality on Wikisource would also allow them to more quickly setup transcription projects in the Wikimedia sphere.

Event Timeline

Aklapper renamed this task from Support remote retrival of multi-page resources via IIIF to Support remote retrieval of multi-page resources via IIIF.Oct 7 2024, 7:35 AM

The idea is that rather than storing massive 1 GB scans on Commons, a mid quality 'readable' PDF/Djvu (of small file size- typically not more than 200MB ) could be for archival purposes, along with an IIIF path through which other projects (especially Wikisource) could obtain higher quality images for transcription purposes. Currently the script I use on English Wikisource (implented in https://en.wikisource.org/wiki/User:Inductiveload/jump_to_file if you wanted to review one approach used) , does something like the above but currently uses a file URL (or IA identfier over a direct IIIF resource path) and at present seems to use a tool-forge(?) hosted script to work out some of what to retrieve. Being able to have the same functionality as a drop in Gadget that could be more widely used would be advantageous for a number of projects. As would being able to give a suitable IIIF path directly so that an archive not currently supported by the current script could be added with minimal changes to the underlying gadget.

As long as we primarily depend on WMF local copies and remote IIIF images are only used as an option, it sounds like a good idea. Having situations where we depend on WMF remote data (i.e., images, etc.) without any local fallback seems problematic.

Some sites that provide IIIF interfaces also do not provide complete all-pages-in-one file options (e.g., PDF, DjVu, zip archives of JP2, etc.) and I can envision the potential need for a tool to pull such image sets and either upload them to Commons as-is (en masse and possibly even help create the WS Index pages) or combine them into such multipage formats for upload to Commons. For example, I would like to see something like an IIIF2DjVu and/or IIIF2PDF builder tool (possibly with Commons uploader option). Alternatively, an IIIF2Commons mass uploader would be good too (possibly with a WS Index creation help, e.g., to generate a set of links to be used as an alternative to <pagelist />, etc.).