Page MenuHomePhabricator

CSP blocks access to iiif.archive.org; breaks script for pulling high-resolution scans from archive.org (for use at Wikisource)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

"12:40:45.238 Content-Security-Policy: The page's settings blocked the loading of a resource at https://iiif.archive.org/iiif/mentimeadiction06coopgoog$754/info.json ("default-src"). openseadragon.js:2402:24"

12:40:45.238 Content-Security-Policy: The page's settings observed the loading of a resource at https://iiif.archive.org/iiif/mentimeadiction06coopgoog$754/info.json ("default-src"). A CSP report is being sent. openseadragon.js:2402:24

What should have happened instead?:

But for the CSP changes, the script (and associated toolforge hosted utility would have loaded the relevant hi-res scan from IA directly.

What the script does :-

The script uses metadata (at Commons), to access the high quality scans hosted on IA (using IIIF) , to work-around image quality issues (such as overcompressed PDF's) that have generated "junk" OCR, or are not clear enough to transcribe/proofread from using the Commons file (and generated thumbnails thereof).

IIIF is a recognised protocol, utilised by a number of GLAM organisations including the Internet Archive. https://iiif.archive.org/iiif/documentation

How could this issue be resolved:

This issue can be resolved by whitelisting the relevant IIIF servers (and only those servers) and forms of IIIF based links offered by the site concerned (and only those forms).

Event Timeline

Aklapper renamed this task from Content Secuirty Policy changes break script for pulling hi-res cans from archive.org (for use at Wikisource_ to CSP changes break script for pulling high-resolution scans from archive.org (for use at Wikisource).Mar 6 2026, 1:03 PM
Aklapper renamed this task from CSP changes break script for pulling high-resolution scans from archive.org (for use at Wikisource) to CSP blocks access to iiif.archive.org; breaks script for pulling high-resolution scans from archive.org (for use at Wikisource).

Please note, any whitelisting, MUST only be for the very specific URL forms noted in the documentation. It is not proposed that there be any general whitelisting for archive.org as a whole. ~~~~

@Jdforrester-WMF : Thanks for grouping.

As stated IIIF is a recognised protocol, and a whitelisting could be designed so that it's only specfic json manifest or image data that's permitted.

The other hope here is that this would enable other tools like IA-upload to be updated to only use the IIIF endpoints, rather than arbitary ones :). .

We plan to re-enable this via the allowlist, thanks for flagging.

Can I make a polite request here that if the 'endpoint' concerned whitelisted, some kind of media-type limitation is considered? (such as limiting it to JSON and relevant media types.) - I am checking to see if IA-Upload is still working.

(Aside: Ideally , IA-upload should be re-evaluted at some point, the current version is relatively old, and thus might need hardening.)

We are now supporting iiif.archive.org within the enforcing CSP, which hopefully unblocks this issue. (relevant config patch)

Please note, any whitelisting, MUST only be for the very specific URL forms noted in the documentation. It is not proposed that there be any general whitelisting for archive.org as a whole. ~~~~

We are generally allow-listing iiif.archive.org at this time. We should definitely plan to further limit this, if feasible, in the near future.

Can I make a polite request here that if the 'endpoint' concerned whitelisted, some kind of media-type limitation is considered? (such as limiting it to JSON and relevant media types.) - I am checking to see if IA-Upload is still working.

We can and should do this in the near future. Per-URL restrictions are potentially difficult to accommodate if it's a large list. But various media types and other CSP directives can and should be further limited.

This comment was removed by ShakespeareFan00.