Page MenuHomePhabricator

New URL scheme for service-generated thumbnails
Closed, InvalidPublic

Description

In order to enable using an open source service like Thumbor (see T110858), the current wikimedia thumbnail URL scheme needs to change.

I suggest the following:

https://upload.wikimedia.org/wikipedia/<wiki>/thumb/<sha1_of_original>/<width_of_original>/<height_of_original>/<thumbnail_mime_type>/<thumbnail_width>/<thumbnail_height>/<title>

An example would be:

https://upload.wikimedia.org/wikipedia/commons/thumb/6a6d34da5123a23243dab9787af39eaba1120d06/3738/2912/image/jpeg/500/390/Louis_Armstrong_restored.jpg

Original file dimensions are needed because Varnish needs to have JPG sharpening logic - which is dependent on original/thumb size ratio - in order to be able to hit Thumbor directly without relying on mediawiki.

Title and revision are only provided for human-readability. The downside being that title renaming will bust viewers' cache needlessly. But this is a rare occurrence and shouldn't matter much.

Note that I am not getting into multipage thumbnails, or the wp-zero quality parameter yet. The initial goal is to get simple JPG and PNG thumbnailing working (91% of our files). Thumbor doesn't support multipage documents anyway. I think these options, which are variations on the default thumbnail, should be taken care of by optional GET parameters anyway.

For example: https://upload.wikimedia.org/wikipedia/commons/thumb/6a6d34da5123a23243dab9787af39eaba1120d06/3738/2912/image/jpeg/500/390/Louis_Armstrong_restored.jpg?qlow

The biggest advantage to working on that new URL scheme is that it allows us to serve thumbnails through the old swift/mediawiki architecture as well as through thumbor at the same time.

Without this new URL scheme, using Thumbor would require always going through Mediawiki (like in my patch here), just to calculate the original/thumbnail size ratio, which is very wasteful.

Event Timeline

Gilles claimed this task.
Gilles raised the priority of this task from to Medium.
Gilles updated the task description. (Show Details)
Gilles added subscribers: Gilles, Bawolff.

I've just realized that this will require having a tracking mechanism to purge articles when thumbnails they contain get purged themselves. Otherwise article caching would result in articles pointing to stale thumbnails for up to 30 days if the file is updated or deleted. That's not a new idea, maybe there's already a task for this?

Come to think of it, it could be a great opportunity to comply with http://iiif.io/, making ourselves compatible with a growing corpus of open source image viewing tools. I'd have to think about the performance implications of allowing people to request tiles, though, as they're more likely to be infrequently accessed than the thumbnails themselves.

First I'll check if thumbor does have all the features IIIF needs. Fundamentally thumbor's URL scheme is different than IIIF's, but if all the features are there, it's just a matter of URL rewrites.

iiif.io seems nice! re: the original url scheme I'm not sure about mime type in url, what was the rationale?

After re-reading the IIIF spec, it seems way too large a standard to support. It requires supporting many formats and filters that thumbor doesn't. I'm not sure that there would be much value in supporting a small subset of IIIF

As for what I wrote in the description, now that I've been playing with thumbor on vagrant more, I can see that there's a less invasive alternative approach of exposing the extra information we need in the existing thumb URL scheme, starting with the sha1. I'm closing this task, and I'll create more targeted ones.