Page MenuHomePhabricator

First page of a specific PDF files on Commons does not render a preview
Closed, ResolvedPublic

Description

The first page of this PDF does not render (on the page or in thumbnail preview). All other pages DO render. And the PDF works if you click on "original file" -> https://commons.wikimedia.org/w/index.php?title=File:Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf&page=1

Event Timeline

What's especially strange about this is that https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf/page1-800px-Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf.jpg persistently returns a 429. Refreshing it just returns a 429 from a different Varnish server. If you replace page1 with page2, you do get a response, but if you change 800px to some other file size (for page1), you still get a 429.

Is Varnish caching 429s or something? I don't really get what's happening here.

Hmm now it's 500ing instead...

I'm seeing related-looking thumbor errors in logstash for this file and others, but all it tells me is that the gs command failed, it doesn't tell me why.

The 500 error is logged simply as:

500 GET /wikipedia/commons/thumb/3/37/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf/page1-800px-Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf.jpg (10.192.16.190) 363.28ms

Then there's a CommandError that logs a call stack but no error message:
{P7985}

Aklapper renamed this task from PDF preview not rendering to First page of a specific PDF files on Commons does not render a preview.Jan 15 2019, 5:15 AM

Failing thumbnails tend to be costly to reattempt, which means repeated requests to those get rate-limited (429). Generally speaking, short of a software upgrade, a failing thumbnail isn't going to work the next time it's requested, hence the use of poolcounter or memcache-based throttling.

The CommandError has an exit code of -11 on ghostscript, which indicates a segfault.

Calling that gs command directly on a production thumbor host, this is confirmed:

gilles@thumbor2001:~$ /usr/bin/gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f/home/gilles/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf
Segmentation fault

stracing shows that it crashes in the middle of reading the PDF file:

read(4, "\10\341}\253\350\360\330\32r\245\254\264\353\375z]\352\237\305\271\347U\304J2\321\177_\360\376kc"..., 4096) = 4096
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x1052000} ---
+++ killed by SIGSEGV +++
Segmentation fault

On Beta, where we run newer packages for Thumbor by virtue of running Stretch rather than Jessie, the file converts fine.

This will get fixed when Thumbor production servers are migrated to Stretch.

jijiki triaged this task as Medium priority.Jan 15 2019, 10:04 PM
jijiki added projects: User-jijiki, serviceops.

So far there's a pretty clear pattern of most PDF rendering issues being fixed by the Stretch upgrade. I have yet to encounter one where we can actually do anything else to fix it in the current setup. I think we have enough to justify the upgrade, we can revisit all PDF rendering bugs once that has happened to see what's left.

(Such bugs never really go away, do they. We never really knew what actually fixed T72734 - I sure hope y'all can figure out what's going on here.)

Gilles claimed this task.

Seems to work now, probably thanks to the Ghostscript update.