Page MenuHomePhabricator

Implement International Image Interoperability Framework (IIIF) prototype service on Wikimedia labs
Closed, ResolvedPublic

Description

James Heald posted a message about IIIF on the wikimaps list, see https://lists.wikimedia.org/pipermail/wikimaps/2015-February/000010.html . Created this task to keep track of it. The standard looks pretty basic so probably not that hard to offer.

Base info at http://iiif.io/ . Api info http://iiif.io/api/image/2.0/ . Daniel already did a (partial?) implementation (example: Chicago.jpg, see ZoomViewer links). The schema is "{scheme}://{server}{/prefix}/{identifier}". Would make sense to make an iiif project so we can server at "http(s)://tools.wmflabs.org/iiif/<filename on Commons>"

If we like the way this work we could also set it up as a production service in the future to be used by other tools (mediawiki api extension?)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I've updated the IIP in ZoomViewer and it now supports IIIF manifests (or whatever those are called). Check it out here

https://tools.wmflabs.org/zoomviewer/iipsrv.fcgi/?iiif=cache/000234cbb30ff063cee3402df80340b8.tif/info.json

I will write a proxy script that allows the users to specify a Commons file name rather than the cryptic cache file name. This should make the zoomviewer a IIIF provider.

Cool @dschwen. Thanks for making this happen after our conversation yesterday!

There is a small issue with CORS headers for the IIIF info.json request, preventing the external tools be able to load the image metadata via AJAX.

I have made a Pull Request on GitHub solving the problem: https://github.com/Toollabs/zoomviewer/pull/1

Once applied this example should run:
https://klokantech.github.io/iiifviewer/#https://tools.wmflabs.org/zoomviewer/iipsrv.fcgi/?iiif=cache/0b438d4d138cfa877a9dfb7e96331b02.tif/info.json

Cheers

Petr

Great! that's super! Thank you very much!

Now IIIF runs!

BTW We have recently made open-source service for scalable IIIF for very large number of images (it start with over 350.000 images from Europeana). See http://embedr.eu/. The source code could be reused directly - and installed on wikipedia commons hardware infrastructure (does WikiMedia has OpenStack or Amazon compatible key-value store like S3? Or how is the cache made?).

Everything in the embedr.eu project is documented and open-source at: https://github.com/klokantech/embedr I would love to see it used!
We have also experience with other production deployments of raster and map delivery systems on cluster of computers from different projects...

In all cases I would love to be involved in developing a scalable IIIF infrastructure for Wikimedia...

Yea! Kittens!!!

http://tools.wmflabs.org/zoomviewer/iipsrv.fcgi/?iiif=cache/a0d692e6815880f28543b54e960610aa.tif/pct:4,80,28,20/full/0/default.jpg

That is absolutely fantastic Daniel -- thank you so much!

But... would it be possible to use the filename in the endpoint, without having to know the cache name?

eg something like:
http://tools.wmflabs.org/zoomviewer/iipsrv.fcgi/?iiif=Jean-Baptiste_Perronneau_-_Madmoiselle_Huquier_-_WGA17215.jpg/pct:4,80,28,20/full/0/default.jpg

Then it would be possible to construct URLs for details of paintings just from the filename and the pct: region specification, both of which it could be accessible easily from wikidata.

Jheald, I can of course add a proxy script for all these requests, but the general idea is that you fetch the JSON info first, like:

https://tools.wmflabs.org/zoomviewer/iiif.php?f=Chicago.jpg

And use the data therein to query the image tiles. That JSON file tells you how large the image and the tiles are, which zoom levels are available, and it will also contain the proper cache file path! Check it out!

iiif.php now just redirects to proxy.php (with the correct parameters). Your example URL should be

http://tools.wmflabs.org/zoomviewer/proxy.php?iiif=Jean-Baptiste_Perronneau_-_Madmoiselle_Huquier_-_WGA17215.jpg/pct:4,80,28,20/full/0/default.jpg

And if that is supposed to show the cat, then it works :-)

That's fantastic! I am so excited, this is going to open up so much!

One tiny problem remaining -- accents and diacritics in URLs can still throw it, eg

http://tools.wmflabs.org/zoomviewer/iipsrv.fcgi/?iiif=Jean_Siméon_Chardin_-_The_Laundress_-_WGA04761.jpg/pct:82,72,12,16/full/0/default.jpg

But this is so nearly there now, I can't thank you enough.

thanks Daniel, this looks really nice.

Thanks for testing this so thoroughly jheald. I have a suspicion why your examples don't work. I'll investigate this further today.

Btw. Try to URL encode the filenames with accents etc. That should work.

It seems that the links above (the ones without the diacritics) work if the image has first been viewed in zoomviewer, which presumably is when the tiles get made and cached.

So having first requested
https://tools.wmflabs.org/zoomviewer/?f=Godward_Idleness_1900.jpg

then
https://tools.wmflabs.org/zoomviewer/proxy.php?iiif=Godward_Idleness_1900.jpg/pct:65,81,35,15/full/0/default.jpg
becomes accessible -- but not until that first URL has first been requested.

Daniel: how easy would it be to get that second call to trigger the tile-building process, if tiles are not yet available?

I'm very keen if this could be moved forward, so that the process can then be got started on Wikidata for a new property "relative position within image" to be approved by the community, to qualify image details with a value like "pct:65,81,35,15".


One other thing: would we expect the present capacity for tile storage to scale up reasonably well ?

There are currently something like 50,000 paintings with items on Wikidata and images on Commons. If detail positions (relying on IIIF availability, and therefore tile storage) started to be specified for more and more of them, is that an amount of image data that the present implementation on tool-labs would be able to cope with?

At what stage would the tile storage start to become too much for tool-labs, and need to be moved to more dedicated hosting (eg taken on as part of the Commons infrastructure ?)

Jheald, the code is already in proxy.php to create the cached tile representation (multiresolution TIFF pyramid). Apparently it has a bug that prevents it from working :o)

I found and fixed that bug.

By the way, URLs with diacritics work without any issues for me (note that your original link is wrong because it uses iipsrv.fcgi directly rather than the proxy.php script!):

http://tools.wmflabs.org/zoomviewer/proxy.php?iiif=Jean_Siméon_Chardin_-_The_Laundress_-_WGA04761.jpg/pct:82,72,12,16/full/0/default.jpg

Details are extracted on the fly and do not require any further storage.

Daniel, thank you *so* much for fixing this -- so now it should be all systems go!

It wasn't so much storage specifically for the details I was worried about, rather storage for all the multiresolution TIFF pyramids.

If requests get made for lots of the 50,000 "Sum of all pictures" images that have Wikidata items, would that sort of number create a problem, or just be a drop in the ocean ?

Jheald, the ZoomViewer has currently 86574 cached images, I would guess that quite a few of your 50000 artworks are already among them. Either way, it is not a mere drop, but not a flood either ;-)

You can start using this although I'm not quite done with all this. I still have to trigger re-downloading and rebuilding data files for pictures that have been re-uploaded on commons. This has been bugging me for a while. I'm sure I'll find a few minutes soon to fix this.

Note I'm doing some further research & poking at IIIF, will see if we can get some closer integration & expand to new media formats. Will split off detail bugs later!

Annotations in the presentation API also look quite interesting, might be a useful way to export image tags/hotspots.

Do you have a link/example?

There are some examples of the annotation system currently used on Commons at https://commons.wikimedia.org/wiki/Commons:Image_annotations#Examples_of_informative_notes:

It's done with some kind of template & JavaScript magic today, via https://commons.wikimedia.org/wiki/Help:Gadget-ImageAnnotator

I found and fixed that bug.

By the way, URLs with diacritics work without any issues for me (note that your original link is wrong because it uses iipsrv.fcgi directly rather than the proxy.php script!):

http://tools.wmflabs.org/zoomviewer/proxy.php?iiif=Jean_Siméon_Chardin_-_The_Laundress_-_WGA04761.jpg/pct:82,72,12,16/full/0/default.jpg

It seems that the IIIF services from zoomviewer does not work anymore, since 2016-04-13. (with a timeout + 502 Bad gateway or 504 Gateway timeout)
Do you know if there is another demo service ? or any plan to fix it ?

I don't know that the F$^&* is wrong with labs. It is very frustrating that the level of stability that it provided is rather low. It ends up creating a burden that increases with the number of projects one has. So apparently this has been broken for 4 days and only now I hear about it (might be time to invest in some better monitoring). Service is out for no good reason.

Ah, right libtiff.so.5: cannot open shared object file: No such file or directory, so where the frack did that go now?

I see some work already going on this task. Outreachy-13 is here, is this task or a part of it a good candidate for a 3 month internship project?

For being a good candidate, it should ideally take 2-3 weeks for an experienced developer to complete.

Something seems broken since today.
With the property P2627/relative position within image in Wikidata we can generate URL connecting to this tool. It's very usefull to get details of depicted elements in a image. Here is an example of use (and a cropper-tool to produce the data). All images of details are linked to this IIIF server. It's really great to have such opportuny for image annotation based on Wikidata. Thank you so much for it. Since today, for unknown reason, images don't always display or have puzzled display. Maybe it could be fixed. Best regards

Hi Shonagon

The four examples on the original property proposal page all still seem to be working; as were a couple of examples I looked at with the Commons zoom viewer, which I believe uses the same IIP backend.

From your Crotos example page above (which is - or was - amazing, btw), it does seem that some detail views are still working; but most evidently are not, and for some the underlying tiles seem to have got scrambled, for example in

while for others the link isn't returning anything at all.

It's a real shame that this seems to have got broken, because it would be really nice to roll out to a wider public. (The page before it got broken was a really awesome demonstration).

Yeah, this is a useful ability and will be needed for panoramic stuff. I'd like to create an IIIF image api & tiling endpoint built into MediaWiki instead of as a separate layer, but we haven't got to it yet..

Yes JHeald the issue is on IIP backend ; saddly, it doesn't work anymore too for some files in Zoomviewer in Commons (example: https://tools.wmflabs.org/zoomviewer/index.php?f=Frederick%2C%20Prince%20of%20Wales%2C%20and%20his%20sisters%20by%20Philip%20Mercier.jpg&flash=no ). Yes, such features are already really great and have a major potiental for Wikimedia projects. For example, it will be really simpler and more practical to have for details in Wikipedia IIIF image fragments instead of separated image files, which curently could be very tedious to make. The issue is very mysterious for me and I hope that it could be fixed. Best regards

Trying to copy-paste the image links for some of the broken detail images on your
Virgin amongst the Virgins page

eg links like this

give error messages like:
<tt>tiff open failed for: /data/project/zoomviewer/public_html/cache/aee31ade17cbf5d86936edfd5722066e.tif</tt>

Does this suggest perhaps some kind of hardware or file-system corruption in the storage over at Labs ?

I'm not sure if this is the best place to ask this - but did the people interested in this bug know about the 2017 IIIF conference happening in June at the Vatican? http://iiif.io/event/2017/vatican/#iiif-conference---the-vatican---2017 Calls for proposals close on 23 February (in 10 days from when I'm writing this comment),
If I understand correctly, IIIF would be something that could be highly relevant to the Structured Data On Commons project, and therefore potentially worth pursuing under that frame of reference?

@Jdforrester-WMF do you know who we should ping re: the structured data project plans and if we can/should get someone to the IIIF conference in June that Liam mentioned above? I'm in on the IIIF's A/V API working group but may not be the best point person for other data-sharing kinds of things: there's kind of middling support for structured metadata in the protocols so far (just a few vaguely-specified fields for attribution, licensing etc) which might want to be expanded, and that'd be something I'd rather leave to folks planning to work in the thick of the metadata. :)

In the meantime I'm happy to keep poking at the IIIF image API for tiling/etc (we'll need it internally for making a stable 2d and spherical panoramic viewing) and there's some potential for the presentation API to do some interactive diagrams, especially with the upcoming A/V extensions.

In T89552#3022081, @brion wrote:

@Jdforrester-WMF do you know who we should ping re: the structured data project plans and if we can/should get someone to the IIIF conference in June that Liam mentioned above? I'm in on the IIIF's A/V API working group but may not be the best point person for other data-sharing kinds of things: there's kind of middling support for structured metadata in the protocols so far (just a few vaguely-specified fields for attribution, licensing etc) which might want to be expanded, and that'd be something I'd rather leave to folks planning to work in the thick of the metadata. :)

In the meantime I'm happy to keep poking at the IIIF image API for tiling/etc (we'll need it internally for making a stable 2d and spherical panoramic viewing) and there's some potential for the presentation API to do some interactive diagrams, especially with the upcoming A/V extensions.

At this point the lead person is Toby until he hires people to actually lead it. :-) Will ping him.

If we can get the IIIF endpoint up and running again (or/and create an IIIF image-serving functionality more centrally to the main MediaWiki code), something that might be good would be the possibility for a user to download an IIIF manifest pretty much whenever the MediaViewer is used (maybe even specify the carousel of images for MediaViewer together with its metadata as an IIIF manifest).

An IIIF manifest is (as I understand it) the key document that presents a bundle of images, and how they should be presented together. It's also the key thing that IIIF editor programs eat -- given various IIIF manifests as input, the IIIF editor allows the user to create a new document combining different aspects of them, which is then passed around as a new manifest. So it makes our images more usable if we can (i) serve them as an IIIF services, but (ii) also serve information about them as an IIIF manifest, that can then be passed around and reused.

Hey all! The _corruption_ i.e. missing tifs is probably a result of a requested cache purge that I performed a while ago. I'll take a look.

Yeah, the pruning confused the system. I'll try to fix this.

If we can get the IIIF endpoint up and running again (or/and create an IIIF image-serving functionality more centrally to the main MediaWiki code), something that might be good would be the possibility for a user to download an IIIF manifest pretty much whenever the MediaViewer is used (maybe even specify the carousel of images for MediaViewer together with its metadata as an IIIF manifest).

An IIIF manifest is (as I understand it) the key document that presents a bundle of images, and how they should be presented together. It's also the key thing that IIIF editor programs eat -- given various IIIF manifests as input, the IIIF editor allows the user to create a new document combining different aspects of them, which is then passed around as a new manifest. So it makes our images more usable if we can (i) serve them as an IIIF services, but (ii) also serve information about them as an IIIF manifest, that can then be passed around and reused.

There are two distinct bits -- the IIIF "image API" does the low-level tiling/zooming etc, and the IIIF "presentation API" can present multiple images linked together, with annotations and text and whatnot. This entry is tracking the image API, which'll be useful for "deep zoom" and panoramic-style images, and we'll want to make use of that in things that integrate into the media viewer, indirectly through zoomable viewer plugins.

The presentation API is also really interesting for creating meta-documents out of multiple images (or in the future audio/video -- I'm at a meeting of the working group working on the spec plans for A/V right now) and we'll probably want to do some stuff there, but should probably break it out to a separate task.

Restricted Application added a subscriber: PokestarFan. · View Herald Transcript

@dschwen Any idea why the IIIF-based detail viewer is working for the images on this page of Shonagon's:

but not this one:

Opening up the html source for the latter and trying to click on one of the images seems to be giving the error
tiff open failed for: /data/project/zoomviewer/public_html/cache/45cc213b042bc33c4103bc6c13b4a275.tif

I have deleted the cache file. Seems to work now.

That looks so great now. Thank you! (And @Shonagon ).

Any idea whether this is a problem that would be likely to recur, if the image-detail syntax was put into more high-volume use?

(ie can image detail use be treated as stable and reliable (or made so), without needing manual intervention?)

Oh thank you !
I made some tests and had succes. Just for one, it was broken and the problem is the same with Zoomviewer : https://tools.wmflabs.org/zoomviewer/index.php?f=Jacques-Louis%20David%2C%20The%20Coronation%20of%20Napoleon%20edit.jpg&flash=no
Thanks to this IIIF image API we can do very easily great things with values on Property P2677 in Wikidata. For example a SparQL query to search the representation of Jesus-Christ in different artworks with image fragment : http://tinyurl.com/yb8cjula . So yes this service is very useful and almost magic for those involved in iconographic indexation and art history.
Best regards

@Shonagon, thanks for the heads up on the Napoleon image. Let me see if I can identify broken images and purge them automatically. I should probably use the method I developed for tiled 360 degree panoramics (image processing on the grid infrastructure) for the Zoomviewer, too. I think that would make it more robust.

I have written a script that checks the tif file integrity (using imagemagick). It has already weeded out dozens of broken files (including the Napoleon). I will put that into the crontab to run weekly.

Hello @dschwen. That's reallly great.
Usually the service works fine. As you know better, sometimes it doesn't and the display is puzzled/grey images. This happened sometimes in cases where before it worked. And in those cases, the service displayed the same thing until the cache has been deleted.
If I understand well, the new script will detect those cases and purge. What a confort! I must admit that when I was using the property 2677 on Wikidata, I had always the fear that it will not work well (maybe there is a process to avoid that?) and could stay like that for a moment. Now, it's completely different and very reassuring to know that this can be repaired.
Thanks you so much
Best regards

The Swiss National Library got a request by research platforms providers to support the IIIF standard and raised the issue at the last meeting of the Swiss GLAM-Wiki Contact Group. IIIF support could be an extra argument for institutions with public domain material or freely licensed material to make their content available through Wikimedia Commons. Especially smaller institutions are not expected to support the standard on their own platforms in the near future.

The original goal of this task was to implement a prototype. I think @dschwen did this (THANKS!) and this task should be closed. All sorts of other conversation happened in this task. More specific focused tasks should probably be made to handle the things discussed here.

Unless someone objects, I'll close this task as resolved in the next week.

Unless someone objects, I'll close this task as resolved in the next week.

None so far. :)

Multichill claimed this task.

IIIF image extracts seem to be broken again: see eg the 'Virgin amongst the virgins' test page at Crotos

pinging @dschwen

Yikes, that looks very messed up. I can rebuild the stack on labs and see if that fixes it.

Looks like it was just cache corruption. This is weird; I don't know how it could happen short of actual filesystem corruption. I deleted the cache files and regenerated them. Looks fine now. I could add a "force purge" option, but I'm a little worried that could be abused as a DOS attack vector.

Hello,
It seems that there are many corrupted files in the cache :
http://zone47.com/crotos/lab/cropper/get.php?q=14619165
http://zone47.com/crotos/lab/cropper/get.php?q=Q1231009
http://zone47.com/crotos/lab/cropper/get.php?q=Q19939091
Actually almost all images I tested (the one that works http://zone47.com/crotos/lab/cropper/get.php?q=21013224 )
By the way, I made a public presention in March about IIIF on Wikimedia projects at IIIF Biblissima meeting in Paris: http://www.biblissima-condorcet.fr/en/indexation-iconographique-sur-projets-wikimedia-iiif . Luckily the service worked well at that time. Many would be very happy to do more, but the instabaility of the service is today unfortunately a major obstacle to the development of IIIF in Wikimedia.
I'm sorry to say that this recurrent issue is very annoying, especially that there is no solution for users to fix it. Although my technical skills are limited, I would be happy to help if I can do something.
Best regards

@Shonagon please don't comment on old closed bugs with new comments. Please open a new bug. I did this for you at T194956