Page MenuHomePhabricator

[Suggestion] Pregenerate next/previous thumb when reading a pdf (or other multi-page file types)
Open, Needs TriagePublic

Description

Use case: End user as a media reader wants to read pdfs fastly. The typical way to do this is to go to the media page, for example,

https://commons.wikimedia.org/wiki/File:Didatticaduepuntozero.pdf

Then I click next->, waiting a bit to read on each page. Every time I click next, I want to get the new page instantly, but there seems to be an annoying delay (1.5-2.4 seconds! with my remote, but high speed home connection).

For comparison, check the "user" experience compared to dedicated sites like Slideshare: (warning, external site) https://www.slideshare.net/jynus/mysql-at-wikipedia-how-we-do-relational-data-at-the-wikimedia-foundation

I believe (not sure about that) that it is due to the screenshot being generated for the first time, because the rest of the elements of the page load with no delay. Once they are on my browser cache, going next and previous is instant.

Proposal: When loading serialized media, preload on cache (our edge and browser) the current large thumb size for the next page automatically asynchronously. This is a proposal that may be done on wrong assumptions, those should be checked first.

This is related to T54881 but much smaller in scope so it could be either a small (yes, I am being optimistic) project or something a volunteer could implement?

Event Timeline

For context, in the past we have received slowness complains with volunteers that handle very heavy pdfs, this could help mitigate those as the rendering "slowness" would be more hidden.

It's certainly possible to make a client-side request in the background for the prev/next large thumbnails.

We could also consider simply pre-generating all pages' standard thumbnail size at upload time. The current pre-rendering mechanism probably only works for the cover.

I think all could be a problem (GBs PDFs), and at most I would leave it opt-in for volunteers (e.g. if someone works on pdfs very often). Just a js call preloading asynchronously the next page would be a huge improvement.

There seems to be some overlap between this suggestion and T286356 and/or T230689. Referencing those in case someone from the other tickets could think this to be the duplicate of another.

One logical route would be T77145: Show PDF Slides in Media Viewer - MediaViewer already handles preloading and is a much nicer viewing experience, but doesn't support PDFs currently. I think support would boil down to T59298: The viewer box should not depend on the current page.

(I'm curious though, why do people try to use the current tiny PDF paging interface instead of just opening the PDF in the browser?)

why do people try to use the current tiny PDF paging interface instead of just opening the PDF in the browser?

Personally- because if the functionality is there- I want to use it and it to be as fast as possible. If you just want to read 1 file I will download it, but sometimes I am browsing multiple files to search for a specific thing in it. I think a Slideshare-like functionality (e.g. presenting directly from the browser) would be useful in some contexts, and the only blocker would be the speed to change page. MediaViewer would be a fantastic fit for this for PDFs.

The original reason why I filed this, however, was because there was some kind of functionality (OCR? Translation?) which people complained was slow on wiki.