Page MenuHomePhabricator

Slow performance with image (PDF?) thumbnailer.
Closed, InvalidPublic

Description

Recently, when doing some proofreading on English Wikisource, I noted that it was taking over 1 min for certain pages to load initally, but that once loaded they load instantly.

https://en.wikisource.org/wiki/Index:The_Innocents_Abroad_(1869).pdf being the work I was trying to proofread.

I asked about this on the operations IRC channel and was advised to raise a concern here:-

The pertinent log portion follows :-

"
Did some checking in logs - mw1297.eqiad.wmnet is where it thinks the scanned images are from
ShakespeareFan00 I can't ping that server though
marostegui ShakespeareFan00: You cannot ping that server, it doesn't have a public iface
ShakespeareFan00 Fair enough
ShakespeareFan00 but that's where the lag seems to be occuring
marostegui let me see if there is something wrong with it
ShakespeareFan00 It seems to be intermittent..
ShakespeareFan00 As once a scan image is cached, it's loads very quickly
ShakespeareFan00 marostegui: I don't know if that server (or related) IS the cause of the slow performance, but thought I better mention that's where the lag seemed to be
marostegui ShakespeareFan00: Sure. I am checking en.wikisource.org events and everything seems to be normal error-wise
ShakespeareFan00 I see some references to deprecated ResourceLoader stuff this end though
ShakespeareFan00 marostegui: It does seem to be the image that causes the lag
ShakespeareFan00 Does it take a while for the backend to render up a thumbnail from PDF is there's not one existing?
ShakespeareFan00 *if
ShakespeareFan00 I've had this slow performance on a "new" thumbnails from djvu/pdf before..
ShakespeareFan00 But it shouldn't take over 1 min to render a scan should it?
ShakespeareFan00 marostegui: Thanks
marostegui ShakespeareFan00: I cannot really find anything wrong there
ShakespeareFan00 Then I am puzzlled
ShakespeareFan00 Because I should not be waiting over 1 min for an image to load
ShakespeareFan00 When I've switched out anythingt that could be delaying
marostegui Could it be your connection? I mean, 3 of us didn't have issues loading that page, could it be your end? Don't know
marostegui Let me ask someone else to see if they have issues too
ShakespeareFan00 I don't see how it could be the connection this end when the main site loads instantly
ShakespeareFan00 It's only the images that lag
ShakespeareFan00 "intermittently"
marostegui I have asked two more people one in Italy and another one in England, and loaded fine too
ShakespeareFan00 Then I remain puzzled
marostegui Me too
ShakespeareFan00 There shouldn't be annything locally causing an over 1 min lage
marostegui Let me try with a proxy
ShakespeareFan00 The only possible thing I can think of is that being in the UK, WMF sites are being "filtered" for adult content by an ISP
ShakespeareFan00 But I have no control over that
marostegui ShakespeareFan00: One of the ones I asked is based in England (no idea about his ISP)
marostegui ShakespeareFan00: Trying now with a proxy over Norway and Germany. Let me try a UK one
ShakespeareFan00 Thanks
ShakespeareFan00 It only seems to occur on first access..
ShakespeareFan00 Once an image is in the cache (either locally or at WMF) it loads instantly
jynus lol
tto I can see the issue ShakespeareFan00 is referring to. The PDF thumbnailer seems extremely slow
ShakespeareFan00 tto: I figured it may be something related to that
tto It's not a particularly hi-res PDF... maybe it's just the overall file size (61 MB) that is making it choke
ShakespeareFan00 For wikisource documents 61MB is not that large
ShakespeareFan00 So ...
marostegui To me it keeps working fine (just tried an UK proxy)
ShakespeareFan00 And bear in mind the scans were being asked for page by page
ShakespeareFan00 marostegui: Bear in mind as I said , once an image is thumbnailed/generated it loads very quickly
tto marostegui, try https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/The_Innocents_Abroad_%281869%29.pdf/page704-788px-The_Innocents_Abroad_%281869%29.pdf.jpg - but replace 788 by a new number to force a new image to be generated
tto ProofreadPage doesn't use normal thumbnail sizes, so the images are not pre-generated and must be created from scratch the first time a book is proofread
tto ShakespeareFan00, I'd suggest filing a Phabricator task
marostegui tto: yep, that is slow
marostegui ShakespeareFan00: yes, please fill a phabricator task with the details so we can track it and act on it
ShakespeareFan00 I'll certainly consider doing that if it persists..
ShakespeareFan00 Do you mind if I quote you?
marostegui You can quote me
ShakespeareFan00 tto:?
tto Fine
"

Event Timeline

@fgiunchedi you've got any idea of what could be causing this?

I catched a 500 on that file with thumbnails, looks like the imagescalers are having an hard time with some of those thumbnails:

:path:/wikipedia/commons/thumb/c/c8/The_Innocents_Abroad_%281869%29.pdf/page15-180px-The_Innocents_Abroad_%281869%29.pdf.jpg
server:mw1295.eqiad.wmnet

I suspect it might be a big pdf, or problematic pages in the particular pdf

Yet more problems with a PDF this morning - https://en.wikisource.org/wiki/Index:Baron_Trump%27s_marvellous_underground_journey.pdf

On the first attempt to render the thumb-nail/page scan it gums up and no image is displayed. But having 'forced' the image load directly, on refresh the image loads without issue.

Perhaps it would be useful in the absence of someone figuring out why some PDF files are not liked by the image scalers, to put in some "Please Wait whilst this scan image is generated code.." into the Proofread page code so that the issue is at least more transparent to users, and force a refresh of the relevant element when the page scan has eventually loaded.

Maybe T175812 is also related if this is a performance problem.

mw1297.eqiad.wmnet was one of the old MediaWiki imagescalers. That system's now been entirely replaced by Thumbor on Wikimedia wikis, which is faster, so the original issue is pretty much resolved.

The thumbnails for Baron_Trump%27s_marvellous_underground_journey.pdf should all have been generated by Thumbor. Generating a new thumbnail for that file isn't exactly blazing fast at 5-10 seconds, but that's not outside expectations for a new thumbnail for a PDF. I wasn't able to reproduce behavior where the thumbnail did not load after a few seconds, making me think that either aggressive client-side timeouts were involved or the mitigation steps had no greater effect than just waiting.

I'm not sure that a "please wait" placeholder would improve user experience here, other than indicating when a non-200 response is received for an image. That would be best broken out into a different task anyway.

I don't see anything actionable in this task nowadays as the entire rendering stack has changed since 2017, hence closing as invalid nowadays.
URLs like https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/The_Innocents_Abroad_(1869).pdf/page704-793px-The_Innocents_Abroad_(1869).pdf.jpg load fast for me.