Maniphest T320841

srcset for image on proofreading edit page differes by 1px
Open, Needs TriagePublicBUG REPORT
Actions

Assigned To

None

Authored By

	Nux
	Oct 14 2022, 11:01 PM

Description

Steps to replicate the issue (include links if applicable):

Edit this page: https://pl.wikisource.org/w/index.php?title=Strona:Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/072&action=edit
Note urls for the .prp-page-image img
Edit this page: https://pl.wikisource.org/w/index.php?title=Strona:Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/074&action=edit
Note urls for the .prp-page-image img

What happens?:

Images semi-randomly gets different set of images:
[ "page72-871px", "page72-1306px", "page72-1742px" ]
[ "page74-871px", "page74-1307px", "page74-1742px" ]

Note 1px difference in 1.5x image. This is problematic because preload mechanism (T230689) depends on previous images to have same size as previous one.

I think the code responsible for this is here (this generates images):
https://doc.wikimedia.org/mediawiki-core/master/php/Linker_8php_source.html#l00791

But I think it might be resolved by making sure base width is even. So it could resolved here I think: PageDisplayHandler::getImageTransform(). Honestly that is a bit of a shoot in the dark. Would probably need to debug the getImageTransform function (in a real-ish environment).

What should have happened instead?:

I would assume all images in one Index should have same px in each scale.

Software version (skip for WMF-hosted wikis like Wikipedia): -

Other information (browser name/version, screenshots, etc.):
Firefox 105.0.3

Related Objects

Mentioned Here: T230689: Preload the next page's image on proofreading view

Event Timeline

Nux created this task.Oct 14 2022, 11:01 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 14 2022, 11:01 PM

Oh, If you want a quick test snippet:

var imgEl = document.querySelector('.prp-page-image img');
// add all current images to map
var imgSet = imgEl.srcset.split(',').map(v=>v.trim().replace(/ [0-9.]+x/, ''));
var imgs = [imgEl.src, ...imgSet];
// show size
console.log(imgs.map(v=>v.replace(/.+\/(page[0-9-_px]+)-.+/, '$1')))

Zdzislaw updated the task description. (Show Details)Oct 15 2022, 9:39 AM

Zdzislaw subscribed.

The preload mechanism is not really coupled with having the same sized image across multiple pages, so this probably doesn't matter from that perspective. However, this output definitely looks interesting and probably should be looked into (especially since the default appears to not be 1024px, which is the expected default size)

Depends on a preload mechanism...

Built-in mechanism is actually using prefetch (not preload) which just doesn't work. Prefetch is just trying to load base image (x1) which is not enough (edit is typically using x1.5). And also prefetch is just a suggestion. Firefox will not load those prefetch images on a cable connection. So the mechanism doesn't work at all.
This gadget from en works but assumes each page has same sizes: https://en.wikisource.org/wiki/MediaWiki:Gadget-Preload_Page_Images.js (I also have a similar script with similar assumptions).

In T320841#8327614, @Nux wrote:

Depends on a preload mechanism...

Built-in mechanism is actually using prefetch (not preload) which just doesn't work. Prefetch is just trying to load base image (x1) which is not enough (edit is typically using x1.5). And also prefetch is just a suggestion. Firefox will not load those prefetch images on a cable connection. So the mechanism doesn't work at all.

The mechanism definitely works, the point here is not to "preload" anything, but rather, to get Thumbor (and related MediaWiki backend) to generate all the sizes of images by default (which imo is the real bottleneck, and can take a couple of minutes). This is achieved by prefetching the page (since the backend will have to generate the srcset tag and thus perform the necessary computation to load the images)

This gadget from en works but assumes each page has same sizes: https://en.wikisource.org/wiki/MediaWiki:Gadget-Preload_Page_Images.js (I also have a similar script with similar assumptions).

That script is definitively broken from a prefetching POV (atleast from my experience having to work around Thumbdor). Fetching the URL doesn't really do anything to get Thumbdor to render the image, it will return a 404 (which the browser will happily pre-cache/pre-load), untill you forcibly (via some API) ask for the image to be generated. [1]

Additionally, I'm working on a coherent API to perform image preloading at gerrit:838918 as well which should probably be used once it is deployed.

[1] ProofreadPage implemented something similar for PagelistWidget using a fixed size image, that might be interesting to look at how it is implemented at modules/index.pagelist/PagelistInputWidget.DialogModel.js:53-98

The mechanism definitely works, the point here is not to "preload" anything, but rather, to get Thumbor (and related MediaWiki backend) to generate all the sizes of images by default (which imo is the real bottleneck, and can take a couple of minutes). This is achieved by prefetching the page (since the backend will have to generate the srcset tag and thus perform the necessary computation to load the images)

Not sure if we are talking about the same thing. If there is some server-side mechanism then I'm not aware of it. But from user perspective it doesn't seem to be working (image loading is slow).

If we are talking about link-prefetch e.g. for Strona:Ibanez - Czterech Jeźdźców Apokalipsy 01.djvu/072 that is in HTML (the only occurrence of "page73" in the source HTML and JS):

<link rel="prefetch" as="image" href="//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page73-871px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg" title="prp-next-image"/>

And this doesn't work. As I said it only could work for x1 (not x1.5) and also prefetch doesn't work as you might think. Prefetch (contrary to preload) is only a suggestion for the browser. See:

preload is a declarative fetch, allowing you to force the browser to make a request for a resource without blocking the document’s onload event.
Prefetch is a hint to the browser that a resource might be needed, but delegates deciding whether and when loading it is a good idea or not to the browser.

Source: Addy Osmani. Preload, Prefetch And Priorities in Chrome. Medium, 2017.

As for DialogModel.prototype.generateImageLink if I may add some suggestions:

imageSize = 1024; // arbitrary number, same as used at PageDisplayHandler.php // this will brake sooner or later. Should be exposed via mw.config.
response.query.pages[ response.query.pageids[ 0 ] ].imageinfo[ 0 ] // you should replace this with a variable.
You should also check if variables/properties exists to avoid braking whole JS module/bundle when the API changes... Well this is in Promise, so not that bad, but still it would be cleaner and faster to check once rather then catch.

One more thing...

That script is definitively broken from a prefetching POV (atleast from my experience having to work around Thumbdor). Fetching the URL doesn't really do anything to get Thumbdor to render the image, it will return a 404 (which the browser will happily pre-cache/pre-load), untill you forcibly (via some API) ask for the image to be generated. [1]

Actually Firefox will only cache requests with 410 status (Gone), not 404 status (Not found). I had this problem in one of my projects 🙂 (were I did want to cache missing resource requests).

Not sure about that script from enwiki. Not sure what happens with Promises. But preloading images old style (new Image) definitely works. Images are cached and everything loads fast (until wrong px image is encountered). This is the script with Image preloading. I tested this thoroughly lately:
https://pl.wikisource.org/wiki/Wikiskryba:Nux/PreloadScans.js

In T320841#8328998, @Nux wrote:

The mechanism definitely works, the point here is not to "preload" anything, but rather, to get Thumbor (and related MediaWiki backend) to generate all the sizes of images by default (which imo is the real bottleneck, and can take a couple of minutes). This is achieved by prefetching the page (since the backend will have to generate the srcset tag and thus perform the necessary computation to load the images)

Not sure if we are talking about the same thing. If there is some server-side mechanism then I'm not aware of it. But from user perspective it doesn't seem to be working (image loading is slow).

Whenever we are requesting a image on Wikisource, we are asking the Varnish cache for the image URL. If the image is not present, Varnish will forward to request to Thumbor asking it to render the image from the given pdf and page data once that is done, or if it takes too long, Varnish will return with the image or a 404 respectively. This is our major bottleneck, since the rendering can take upwards of a few minutes++ to happen.

In cases where Thumbor is not present (i.e. on a non-wikimedia setting), this process will return a 404, unless the image has been explicitly been requested from the MediaWiki backend.

If we are talking about link-prefetch e.g. for Strona:Ibanez - Czterech Jeźdźców Apokalipsy 01.djvu/072 that is in HTML (the only occurrence of "page73" in the source HTML and JS):
<link rel="prefetch" as="image" href="//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page73-871px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg" title="prp-next-image"/>
And this doesn't work. As I said it only could work for x1 (not x1.5) and also prefetch doesn't work as you might think. Prefetch (contrary to preload) is only a suggestion for the browser. See:

preload is a declarative fetch, allowing you to force the browser to make a request for a resource without blocking the document’s onload event.
Prefetch is a hint to the browser that a resource might be needed, but delegates deciding whether and when loading it is a good idea or not to the browser.

Source: Addy Osmani. Preload, Prefetch And Priorities in Chrome. Medium, 2017.

The spec in this case differs a lot from the actual implementation (and the data has changed significantly since 2017). From my personal experience, prefetch is near instantaneous, once the page has been fully loaded, where as preload can be pretty hit and miss, and in the case of Chrome, non-existent.

In our case, to kick start the pre-rendering process, we use the following directive:

<link rel="prefetch" href="/wiki/Strona:Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/073">

which will force MediaWiki to perform the computation required to generate the image (on Wikimedia that should be handled by Thumbor) since it has to generate the srcset tag.

In T320841#8329058, @Nux wrote:

As for DialogModel.prototype.generateImageLink if I may add some suggestions:

imageSize = 1024; // arbitrary number, same as used at PageDisplayHandler.php // this will brake sooner or later. Should be exposed via mw.config.

response.query.pages[ response.query.pageids[ 0 ] ].imageinfo[ 0 ] // you should replace this with a variable.

You should also check if variables/properties exists to avoid braking whole JS module/bundle when the API changes... Well this is in Promise, so not that bad, but still it would be cleaner and faster to check once rather then catch.

This code was written pre-dating the mw.config changes, and is stuck in ES5 land, I plan to make some updates once I get some time and ES6 is allowed in MediaWiki extensions (will require some server-side changes as well, since mw.config is a bit flaky on the Index: page)

In T320841#8329090, @Nux wrote:

One more thing...

That script is definitively broken from a prefetching POV (atleast from my experience having to work around Thumbdor). Fetching the URL doesn't really do anything to get Thumbdor to render the image, it will return a 404 (which the browser will happily pre-cache/pre-load), untill you forcibly (via some API) ask for the image to be generated. [1]

Actually Firefox will only cache requests with 410 status (Gone), not 404 status (Not found). I had this problem in one of my projects 🙂 (were I did want to cache missing resource requests).

Not sure about that script from enwiki. Not sure what happens with Promises. But preloading images old style (new Image) definitely works. Images are cached and everything loads fast (until wrong px image is encountered). This is the script with Image preloading. I tested this thoroughly lately:
https://pl.wikisource.org/wiki/Wikiskryba:Nux/PreloadScans.js

I'm unable to get the script running (see my edits on pl.wikisource, I'm not sure how your using ES6 in a user-script context).

My concerns still stand, as this seems like it is bypassing the pre-rendering step completely which is probably going to cause more latency on first load. I personally would have implemented this by first sending a image-info request (to be replaced by prppifp once I finish work on that API 😄 ) and then loading the individual images. That being said, the Image() based preloading does make more sense than the en.wikisource script's implementation, provided it works with Openseadragon.

https://pl.wikisource.org/wiki/Wikiskryba:Nux/PreloadScans.js

I'm unable to get the script running (see my edits on pl.wikisource, I'm not sure how your using ES6 in a user-script context).

You can load ES6 with the loader. It is only forbidden in gadgets (for the moment at least).
https://pl.wikisource.org/w/index.php?title=Wikiskryba:Nux/common.js&oldid=3197848#L-30
I find it faster to prototype things in ES6+, but I might get it to work with ES5... Once this task is fixed 🙂🤞

My concerns still stand, as this seems like it is bypassing the pre-rendering step completely which is probably going to cause more latency on first load.

From 20+ years in programming I've learned that it is not always a good thing to relay on a hunch 😉. Follow your hunch for simple things, but test it when someone suggest you are wrong. I've tested this. It works. Images are served from cache or loaded very fast.

Note that my script has an intentional delay for preload. Images are loaded when you are actually reading the page (so with 5-12 seconds delay).

@Nux The imagesforpage API query module should be live now on all Wikisources, feel free to use that to fetch images for the next and previous pages :)

@Soda @Nux Your discussion is very interesting, but please remember that the goal is for the native preload mechanism (T230689) to function properly and not get stuck with "1px difference". I don't think we want to create local gadgets if so much effort and time has gone into developing the native proofread preload mechanism.

@Soda The API seems to work fine :-)

Still don't know why is there 1px difference. Seems to me like it should still be investigated (by someone with debug rights on some ws test server; if there is one with similar config to actual ws).
page 72

{
"responsiveimages": {
"2":	"//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page72-1742px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg",
"1.5":	"//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page72-1306px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg"
}}

page 74

{
"responsiveimages": {
"2":	"//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page74-1742px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg",
"1.5":	"//upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu/page74-1307px-Ibanez_-_Czterech_Je%C5%BAd%C5%BAc%C3%B3w_Apokalipsy_01.djvu.jpg"
}}

@Zdzislaw The native preload mechanism from T230689 doesn't suffer from this bug.

@Nux The file has different dimensions for each page (see: https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=imageinfo&titles=File%3AIbanez%20-%20Czterech%20Jeźdźców%20Apokalipsy%2001%20OCR.djvu&formatversion=2&iiprop=metadata). I don't find it too surprising that they end up with slightly different dimensions when resized.

@matmarex If that would be used (I think it is not used) then images in all pages would differ greatly. However, this is not the case. They only ever get different results off by 1px. So this is a rather weird phenomenon :)

Draco flavus checked some of the pages:

page: 72;  real: 761 x 1127;  ws: 871/1306/1742
page: 73;  real: 784 x 1127;  ws: 871/1306/1742
page: 74;  real: 766 x 1123;  ws: 871/1307/1742
page: 75;  real: 771 x 1123;  ws: 871/1307/1742
page: 76;  real: 768 x 1121;  ws: 871/1306/1741
page: 77;  real: 771 x 1121;  ws: 871/1306/1741
page: 78;  real: 768 x 1127;  ws: 871/1306/1742
page: 79;  real: 775 x 1127;  ws: 871/1307/1742

The final dimensions don't seem to relay neither on width nor height. See e.g. 76 vs 78 sharing 768 width. Pages 78 and 79 sharing same height. And in all cases base image is always 871px.

This seems to be the code where it is calculated:

$hp15 = $hp;
$hp15['width'] = round( $hp['width'] * 1.5 );
$hp20 = $hp;
$hp20['width'] = $hp['width'] * 2;

As $hp['width'] is a base width and it is always 871px then how does 1.5 and 2.0 differ? Maybe $hp['width'] is not an integer?

But... Having said all that... if you think that this doesn't cause any real issues, then the issue can be closed.

	F35606325: obraz.png
	Oct 20 2022, 10:38 PM

srcset for image on proofreading edit page differes by 1pxOpen, Needs TriagePublicBUG REPORTActions

Description

Related Objects

Event Timeline

srcset for image on proofreading edit page differes by 1px
Open, Needs TriagePublicBUG REPORT
Actions