djvu failure for very high page number
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Gilles
	Sep 14 2016, 8:38 AM

Description

Sep 14 08:10:58 thumbor1001 thumbor@8831[105390]: tornado.access:ERROR 504 GET /wikipedia/commons/thumb/6/61/The_Film_Daily%2C_Jan-Jun_1929.djvu/page1566-3056px-The_Film_Daily%2C_Jan-Jun_1929.djvu.jpg (127.0.0.1) 185.15ms
Sep 14 08:11:00 thumbor1001 thumbor@8809[13711]: tornado.access:ERROR 504 GET /wikipedia/commons/thumb/6/61/The_Film_Daily%2C_Jan-Jun_1929.djvu/page1567-455px-The_Film_Daily%2C_Jan-Jun_1929.djvu.jpg (127.0.0.1) 90.89ms
Sep 14 08:11:00 thumbor1001 thumbor@8821[3356]: tornado.access:ERROR 504 GET /wikipedia/commons/thumb/6/61/The_Film_Daily%2C_Jan-Jun_1929.djvu/page1568-180px-The_Film_Daily%2C_Jan-Jun_1929.djvu.jpg (127.0.0.1) 118.47ms

Production mediawiki handles that file fine.

Revisions and Commits

rTHMBREXT Thumbor Plugins
	rTHMBREXT7c7fbfc9b5ec Switch back to using ddjvu
	rTHMBREXT58d2d20c686f Add test case for huge DJVU failing in production
	rTHMBREXTd154349aa0f6 Switch back to using ddjvu
	rTHMBREXTe1c10e52491e Add test case for huge DJVU failing in production

Related Objects
Search...

Status	Assigned	Task
Resolved	• Gilles	T121388 Service-based thumbnailing re-architecture in production with Thumbor
Resolved	• Gilles	T139606 add thumbor to production infrastructure
Resolved	• Gilles	T145616 djvu failure for very high page number

Event Timeline

• Gilles created this task.Sep 14 2016, 8:38 AM

• Gilles edited parent tasks, added: T139606: add thumbor to production infrastructure; removed: T145614: 0px thumbnail requests should fail more elegantly.Sep 14 2016, 8:42 AM

• Gilles added a commit: rTHMBREXTe1c10e52491e: Add test case for huge DJVU failing in production.Sep 15 2016, 9:17 AM

This is really odd. When I run the test directly with the following command it passes:

nosetests tests/integration/test_types.py:WikimediaTest.test_huge_djvu

But if I run it with the test suite using "make test", it fails:

thumbor: ERROR: ERROR: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/thumbor/handlers/__init__.py", line 111, in get_image
    self.context.request.image_url
  File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1024, in run
    yielded = self.gen.send(value)
  File "/usr/local/lib/python2.7/site-packages/thumbor/handlers/__init__.py", line 525, in _fetch
    self.context.request.engine.load(fetch_result.buffer, extension)
  File "/Users/gdubuc/Dropbox/Mediawiki/thumbor-plugins/wikimedia_thumbor/engine/proxy/proxy.py", line 118, in load
    self.lcl[enginename].load(buffer, extension)
  File "/usr/local/lib/python2.7/site-packages/thumbor/engines/__init__.py", line 162, in load
    image_or_frames = self.create_image(buffer)
  File "/Users/gdubuc/Dropbox/Mediawiki/thumbor-plugins/wikimedia_thumbor/engine/djvu/djvu.py", line 64, in create_image
    width, height = page_job.size
  File "djvu/decode.pyx", line 1994, in djvu.decode.PageJob.size.__get__ (build/temp.macosx-10.11-x86_64-2.7/src/decode.c:22097)
NotAvailable

thumbor: ERROR: [BaseHandler] get_image failed for url `The_Film_Daily.djvu`. error: ``
tornado.access: ERROR: 500 GET /unsafe/400x/filters:page(1568)/The_Film_Daily.djvu (127.0.0.1) 707.55ms

It seems to do that only if a djvu has already been processed before. Like for example, I can isolate the failure by just running those tests:

def test_djvu(self):
    self.run_and_check_ssim_and_size(
        'unsafe/400x/filters:page(2)/Il_cavallarizzo.djvu',
        'page2-400px-Il_cavallarizzo.djvu.jpg',
        # Mediawiki generates incorrect dimensions in this test case
        # resulting in soft djvu thumbs
        0.88,
        1.0
    )

def test_huge_djvu(self):
    self.run_and_check_ssim_and_size(
        'unsafe/400x/filters:page(1568)/The_Film_Daily.djvu',
        'page1568-400px-The_Film_Daily.djvu.jpg',
        # Mediawiki generates incorrect dimensions in this test case
        # resulting in soft djvu thumbs
        0.88,
        1.2
    )

It seems like an issue with the djvu library, possibly not cleaning up its data properly after processing a file? That would explain why mediawiki is fine with that file, since it runs each request as a one-off.

MoritzMuehlenhoff subscribed.Sep 15 2016, 10:42 AM

I'll switch back to using the command line ddjvu tool. Reloading the djvu python library with reload() was insufficient.

While the djvu issue is real, I've just discovered that the 504 error above is a different issue, that happens before thumbor even tries to process the image. It seems like with our current configuration and a recent version of Tornardo, by default Thumbor is unable to load source files that are bigger than 100MB:

2016-09-15 13:08:01 tornado.general:INFO Malformed HTTP message from None: Content-Length too long
2016-09-15 13:08:01 thumbor:DEBUG METRICS: inc: original_image.status.599:1
2016-09-15 13:08:01 thumbor:WARNING ERROR retrieving image https://dl.dropboxusercontent.com/u/109867/The_Film_Daily.djvu: HTTP 599: Connection closed
2016-09-15 13:08:01 tornado.access:ERROR 504 GET /unsafe/400x/filters:page(1568)/dl.dropboxusercontent.com/u/109867/The_Film_Daily.djvu (127.0.0.1) 1412.19ms

This is explained here: https://github.com/saltstack/salt/issues/24048

Thumbor supports using a Curl Async loader that doesn't keep the download in memory as it happens, via the HTTP_LOADER_CURL_ASYNC_HTTP_CLIENT option.

The unit tests aren't affected because they don't use the http loader, they load files from disk. And the video loader uses its own code, which is why it didn't come up in the video loader tests either (the video loader only downloads the frame it needs anyway).

• Gilles added a commit: rTHMBREXTd154349aa0f6: Switch back to using ddjvu.Sep 15 2016, 11:39 AM

• Gilles moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.Sep 15 2016, 7:48 PM

• Gilles added a commit: rTHMBREXT58d2d20c686f: Add test case for huge DJVU failing in production.Sep 26 2016, 3:25 PM

• Gilles added a commit: rTHMBREXT7c7fbfc9b5ec: Switch back to using ddjvu.

• Gilles closed this task as Resolved.Oct 5 2016, 11:57 AM