Page MenuHomePhabricator

Corrupt JPG previews of multi-layered PDF book (not all pages, not all sizes)
Closed, ResolvedPublic

Description

Corrupt PDF Previews

Only some sizes of some pages of PDF previews are corrupt, e.g. https://commons.wikimedia.org/w/index.php?title=File%3ATrolley-Trips-Through-New-England.pdf&page=17 shows only gray lines in the default preview while other sizes render OK. Also see images associated with the "next page" and "previous page" links—some there seem to render only the first layer. When using pdfimages tool on the original PDF the page in question separates out to 3 layers--a pbm (monochrome), and couple pnm layers. PDF Displays fine directly in Chrome/Firefox and also XPdf, albeit slowly… seems to be thumbnailer/pdf renderer issue, perhaps.

Purged cache for page this did not solve.

Corrupt

Good

Event Timeline

Not sure related—seeing errors for https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Trolley-Trips-Through-New-England.pdf/page21-155px-Trolley-Trips-Through-New-England.pdf.jpg and https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Trolley-Trips-Through-New-England.pdf/page21-635px-Trolley-Trips-Through-New-England.pdf.jpg :

Our servers are currently under maintenance or experiencing a technical problem. Please try again in a few minutes. See the error message at the bottom of this page for more information…

Request from xxx.xxx.xxx.xxx via cp1072 cp1072, Varnish XID 85497695
Error: 429, Too Many Requests at Sun, 03 Jun 2018 03:27:19 GMT

I can't see any of the corrupted images, which might have been generated by the old thumbnailing stack. All thumbnails for that file fail to render for me at the moment, the error seen in logstash is consistent, it's timing out. The time limit for a thumbnail to render is 59 seconds in production and ghostscript is hitting that limit when trying to render a thumbnail for that file.

I was able to reproduce the issue manually in the command line on a thumbnailing server:

gilles@thumbor1001:~$ /usr/bin/firejail --profile=/etc/firejail/thumbor.profile timeout --foreground 59 gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f/var/Trolley-Trips-Through-New-England.pdf 
Reading profile /etc/firejail/thumbor.profile
Parent pid 40153, child pid 40154
Child process initialized

Parent is shutting down, bye...
gilles@thumbor1001:~$ echo $?
124

We have to draw the line somewhere, using 1+ minute of server time to render a single thumbnail is unreasonable. It must be something to do with how that particular file is encoded, because other PDFs of that size on Commons render just fine without taking that long. I would advise re-uploading it as a PDF with different encoding/compression settings and seeing if that fixes the problem.

Thanks for the insight--if it's a timeout issue is there any way for a user (me) to know instead of filing a report like this in future? I imagine simply re-encoding anything like this not showing correctly is a good test but I hate to unnecessarily modify someone's original upload and clutter the changelogs.

Will look into re-encoding this now anyways--seems was from a book scanning project perhaps using multiple wavelengths of light and composing from all of them--not your ordinary PDF book file from my cursory glance anyways.

gs -sDEVICE=pdfwrite -sOutputFile=Trolley-Trips-Through-New-England.pdf -dNOPAUSE -dSAFER Trolley-Trips-Through-New-England_ORIGINAL.pdf has solved this but file has gone from 12M to 46M for reasons unknown to me.

ASiplas claimed this task.
Vvjjkkii renamed this task from Corrupt JPG previews of multi-layered PDF book (not all pages, not all sizes) to 0rbaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed ASiplas as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 0rbaaaaaaa to Corrupt JPG previews of multi-layered PDF book (not all pages, not all sizes).Jul 2 2018, 12:52 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot claimed this task.
CommunityTechBot reassigned this task from CommunityTechBot to ASiplas.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
CommunityTechBot subscribed.