Page MenuHomePhabricator

IA Upload unable to convert a JPEG2000 to JPEG
Open, MediumPublic

Description

IA uploader appears to be hanging "in progress"
here is a link of the log
https://tools.wmflabs.org/ia-upload/log/Httpsdl.wdl.org11960service11960.pdf

Event Timeline

Log for convenience:

[2017-04-08 03:45:11] LOG.INFO: Creating DjVu for Httpsdl.wdl.org11960service11960.pdf from Jp2 [] []
[2017-04-08 03:45:11] LOG.INFO: Saving IA metadata to /mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/metadata.json [] []
[2017-04-08 03:45:11] LOG.INFO: Downloading Httpsdl.wdl.org11960service11960.pdf/11960_djvu.xml [] []
[2017-04-08 03:45:13] LOG.INFO: Downloading Httpsdl.wdl.org11960service11960.pdf/11960_jp2.zip [] []
[2017-04-08 03:45:26] LOG.INFO: Unzipping /mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/11960_jp2.zip [] []
[2017-04-08 03:45:31] LOG.DEBUG: Zip file extracted to /mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/11960_jp2 [] []
[2017-04-08 03:45:31] LOG.INFO: Processing JP2 files [] []
[2017-04-08 03:45:31] LOG.INFO: Converting 50 individual JP2s to DjVus [] []
[2017-04-08 03:45:31] LOG.DEBUG: Converting /mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/11960_jp2/11960_0000.jp2... [] []
[2017-04-08 03:45:31] LOG.DEBUG: ...to /mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/build/Httpsdl.wdl.org11960service11960.pdf_p0.jpg [] []
[2017-04-08 03:45:36] LOG.CRITICAL: Command "convert -resize 1500x1500 "/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/11960_jp2/11960_0000.jp2" "/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/build/Httpsdl.wdl.org11960service11960.pdf_p0.jpg" 2>&1" exited with code 1: jpc_dec_decodecblks failed error: cannot decode code stream convert.im6: unable to decode image file `/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/11960_jp2/11960_0000.jp2' @ error/jp2.c/ReadJP2Image/402. convert.im6: no images defined `/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/Httpsdl.wdl.org11960service11960.pdf/build/Httpsdl.wdl.org11960service11960.pdf_p0.jpg' @ error/convert.c/ConvertImageCommand/3044. [] []
Cyberpower678 removed a project: InternetArchiveBot.
Cyberpower678 added a subscriber: Cyberpower678.

What is this. This doesn't apply to me.

Tpt added a subscriber: Samwilson.

Thank you for the report. I subscribes Sam Wilson who has worked on this part of the tool.

There appears to be something weird going on with the JP2 files in https://archive.org/download/Httpsdl.wdl.org11960service11960.pdf/11960_jp2.zip

Will try to replicate locally.

Samwilson renamed this task from IA uploader hanging to IA Upload unable to convert a JPEG2000 to JPEG.May 11 2017, 7:24 AM
Samwilson triaged this task as Medium priority.

This appears to no longer be failing for the same cause (since we switched to Graphicsmagick). It looks like it's still hanging though! :-(

Hmm. I grabbed the JP2s and ran gm on them, and the issue seems to be that these files, by virtue of being a pretty ridiculous resolution (6851x10000), exceeds a hard-coded limit in—in my case—the Jasper library that gm uses for JPEG-2000 support. I believe gm can be linked against different libraries for this (and the particular problem is a Jasper thing) which probably explains some variability in symptoms: different distributions will have linked different underlying libraries, and it may change with OS updates. I think this specific scan is best considered a pathological case.

handbookofnature00coms_0 seems to be a different case (but possibly with the same symptoms).

But I am not sure why ia-upload would hang on this. Both ImageMagick and GraphicsMagick return a non-zero exit status which Command::exec() should detect and throw an exception for. On the other hand, I don't see runCommand() trying to handle any such exceptions (e.g. by terminating gracefully) so that may explain the user-observable behaviour.

Yes, I just retried handbookofnature00coms_0 and that is an instance of T215647:

[2020-02-12 16:56:37] LOG.CRITICAL: Command not found: "djvm -c "/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/handbookofnature00coms_0/handbookofnature00coms_0.djvu" "/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/handbookofnature00coms_0/build/handbookofnature00coms_0_p0.djvu" …

It is also a ~1k-pages book, which certainly supports the theory that it's bombing due to exceeding ARG_MAX or one of the related limits.

Ah, yes. I just ran across another instance of this. The problem is with extremely high-resolution (~4k x ~10k) JPEG-2000 files (particularly when combined with pathological codec settings during encoding) that exceed a hard limit in the Jasper library that GraphicsMagick uses. ImageMagick uses the openjpeg library, so it successfully deals with them, or you can use the utilities in the openjpeg library directly to convert these files.

Since these cases seems to be rare it's probably not worth doing anything in ia-upload unless switching back to ImageMagick is warranted for other reasons. I can manually fix such files at need (hit me up on my enWS user talk page).