Page MenuHomePhabricator

Investigate thumbor
Closed, ResolvedPublic

Description

Following our meeting discussing thumbnailing architecture changes, it seems to me like an open source service might be able to solve most of our problems, at least for the biggest formats (JPG, PNG). Thumbor is the most promising candidate.

I would like to verify that it can:

  • be as fast as our current invoking IM directly
  • host thumbnails with URLs based on the image content (hash)
  • easily and efficiently purge the various sizes of a given image
  • support LRU caching, automatically getting rid of thumbnails that haven't been accessed in a while
  • be extended easily to support more input formats

My strategy will be to hack it into a vagrant VM and hook it up to mediawiki in a crude way.

Event Timeline

Gilles claimed this task.
Gilles raised the priority of this task from to Medium.
Gilles updated the task description. (Show Details)
Gilles added subscribers: Steinsplitter, aaron, Krinkle and 8 others.

First important findings:

  • Thumbor doesn't help with the URL scheme. Its URLs are based on the URL of the original, therefore the solution to this problem would have to be external to thumbor
  • Thumbor doesn't directly help with purges. Thumbnails expire in thumbor, though. If we moved to an sha1-based URL scheme, we would simply need to store a database of deleted image hashes and don't forward the request to thumbor when we hit one. The copy thumbor holds will naturally expire on its own.
  • Thumbor is surprisingly fast on my VM, like 8 times faster than mediawiki's IM resizing. I have no idea why, as I haven't dug into thumbor's code yet to figure out which image processing commands/library calls it does.
  • There are several options for storing copy of originals on the thumbor machine. Not really interesting for what we do. However, there's only one option to store the generated thumbnails in an LRU cache: the local FS.
  • I'm not sure what expiry mechanism it uses for its file-based storage of thumbnails, we should investigate that to figure out if it's not going to bite us when it starts handling large amounts of expiring thumbnails
  • Pointing varnish directly to thumbor for JPEGs to leapfrog mediawiki is difficult for as long as we keep the current JPEG sharpening logic based on resize ratio. In order to allow varnish to talk directly to thumbor, we need to either expose the file dimensions in the original's URL and reproduce mediawiki's logic in varnish or always sharpen regardless of ratio (not sharpening anymore is likely to cause a riot on Commons).

Change 235018 had a related patch set uploaded (by Gilles):
Thumbor role

https://gerrit.wikimedia.org/r/235018

Change 235020 had a related patch set uploaded (by Gilles):
Support for a thumb nailing service

https://gerrit.wikimedia.org/r/235020

Change 235709 had a related patch set uploaded (by Gilles):
Have apache talk directly to thumbor

https://gerrit.wikimedia.org/r/235709

Change 235020 abandoned by Gilles:
Support for a thumbnailing service

Reason:
I6877328a09cf2a3be7121f837bbad6cb9d48908e is a much cleaner approach. While plain Apache doesn't allow us to do size-dependent sharpening, Varnish should be able to.

https://gerrit.wikimedia.org/r/235020

Change 235716 had a related patch set uploaded (by Gilles):
Enable statsd in thumbor role

https://gerrit.wikimedia.org/r/235716

Change 235748 had a related patch set uploaded (by Gilles):
Send Thumbor errors to Sentry

https://gerrit.wikimedia.org/r/235748

Change 235709 merged by jenkins-bot:
Have apache talk directly to thumbor

https://gerrit.wikimedia.org/r/235709

Change 235716 merged by jenkins-bot:
Enable statsd in thumbor role

https://gerrit.wikimedia.org/r/235716

Change 235748 merged by jenkins-bot:
Send Thumbor errors to Sentry

https://gerrit.wikimedia.org/r/235748