The first page of this PDF does not render (on the page or in thumbnail preview). All other pages DO render. And the PDF works if you click on "original file" -> https://commons.wikimedia.org/w/index.php?title=File:Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf&page=1
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | jijiki | T170817 Upgrade Thumbor servers to Stretch | |||
Resolved | • Gilles | T213771 First page of a specific PDF files on Commons does not render a preview |
Event Timeline
What's especially strange about this is that https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf/page1-800px-Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf.jpg persistently returns a 429. Refreshing it just returns a 429 from a different Varnish server. If you replace page1 with page2, you do get a response, but if you change 800px to some other file size (for page1), you still get a 429.
Is Varnish caching 429s or something? I don't really get what's happening here.
I'm seeing related-looking thumbor errors in logstash for this file and others, but all it tells me is that the gs command failed, it doesn't tell me why.
The 500 error is logged simply as:
500 GET /wikipedia/commons/thumb/3/37/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf/page1-800px-Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf.jpg (10.192.16.190) 363.28ms
Then there's a CommandError that logs a call stack but no error message:
{P7985}
Failing thumbnails tend to be costly to reattempt, which means repeated requests to those get rate-limited (429). Generally speaking, short of a software upgrade, a failing thumbnail isn't going to work the next time it's requested, hence the use of poolcounter or memcache-based throttling.
The CommandError has an exit code of -11 on ghostscript, which indicates a segfault.
Calling that gs command directly on a production thumbor host, this is confirmed:
gilles@thumbor2001:~$ /usr/bin/gs -sDEVICE=jpeg -dJPEG=90 -sOutputFile=%stdout -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -dSAFER -q -f/home/gilles/Global_Wikipedia_and_Wikimedia_Brand_Research_Report.pdf Segmentation fault
stracing shows that it crashes in the middle of reading the PDF file:
read(4, "\10\341}\253\350\360\330\32r\245\254\264\353\375z]\352\237\305\271\347U\304J2\321\177_\360\376kc"..., 4096) = 4096 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x1052000} --- +++ killed by SIGSEGV +++ Segmentation fault
On Beta, where we run newer packages for Thumbor by virtue of running Stretch rather than Jessie, the file converts fine.
This will get fixed when Thumbor production servers are migrated to Stretch.
Oh, see also T198061: Thumbnails missing for PDF file (HTTP 429 error) then.
I also found these, which might be the same (or a different bug)/
So far there's a pretty clear pattern of most PDF rendering issues being fixed by the Stretch upgrade. I have yet to encounter one where we can actually do anything else to fix it in the current setup. I think we have enough to justify the upgrade, we can revisit all PDF rendering bugs once that has happened to see what's left.
(Such bugs never really go away, do they. We never really knew what actually fixed T72734 - I sure hope y'all can figure out what's going on here.)