Page MenuHomePhabricator

ParsoidBatchAPI: height/width param for some images seem different from non-batching API (which matches PHP parser output)
Closed, ResolvedPublic

Description

After cherry-pick of https://gerrit.wikimedia.org/r/#/c/238082, we ran rt-testing again y'day. It is looking good overall now. No errors in logs (another 30K tests to finish -- the clients needed restarting this morning -- but I don't expect to see any). So, this is almost ready to be enabled in production.

But, since rt-testing doesn't deal with rendering diffs, I used https://github.com/subbuss/parsoid_visual_diffs/blob/master/bin/examples/batching.diffsettings.js and https://github.com/subbuss/parsoid_visual_diffs/blob/master/bin/gendiffs.sh to generate visual diffs on a sample of about 100 random enwiki titles from the rt-testing set. That test yielded one significant visual diff on enwiki:Tijuana_bible.

The height/width parameter on one of the images seems to be different. The bug can be reproduced on this following wikitext from that page

$ echo "[[File:Tjbfullerbrush1.jpg|thumb|380px|right|The first 8-page installment of ''The Adventures of a Fuller Brush Man'', published circa 1936.]]" | node parse --useBatchAPI true
...
<img resource="./File:Tjbfullerbrush1.jpg" src="//upload.wikimedia.org/wikipedia/en/8/8c/Tjbfullerbrush1.jpg" data-file-width="225" data-file-height="167" data-file-type="bitmap" height="282" width="380" data-parsoid='{"a":{"resource":"./File:Tjbfullerbrush1.jpg","height":"282","width":"380"},"sa":{"resource":"File:Tjbfullerbrush1.jpg"}}'/>
...

$ echo "[[File:Tjbfullerbrush1.jpg|thumb|380px|right|The first 8-page installment of ''The Adventures of a Fuller Brush Man'', published circa 1936.]]" | node parse --useBatchAPI false
...
<img resource="./File:Tjbfullerbrush1.jpg" src="//upload.wikimedia.org/wikipedia/en/8/8c/Tjbfullerbrush1.jpg" data-file-width="225" data-file-height="167" data-file-type="bitmap" height="167" width="225" data-parsoid='{"a":{"resource":"./File:Tjbfullerbrush1.jpg","height":"167","width":"225"},"sa":{"resource":"File:Tjbfullerbrush1.jpg"}}'/>
...

Given the 1 in 100 occurrence and given that this is a regression, I think we should fix this before enabling this in production.

Event Timeline

ssastry assigned this task to tstarling.
ssastry raised the priority of this task from to High.
ssastry updated the task description. (Show Details)
ssastry subscribed.

Tijuana_bible.diff.png (4×1 px, 2 MB)
is the visual diff on that page with / without batching API.

ssastry set Security to None.

I picked a different sample of 250 pages and atleast 3 pages seemed to have this issue.

This wikitext "[[File:Karmichael Hunt small.JPG|thumb|left|Hunt in action against for the Broncos]]" from enwiki:Karmichael_Hunt has a similar problem.

ssastry renamed this task from Batching API: height/width param for some images seem different from non-batching API (which matches PHP parser output) to ParsoidBatchAPI: height/width param for some images seem different from non-batching API (which matches PHP parser output).Sep 15 2015, 8:37 PM

This is the "we don't upscale bitmaps (but we do upscale vector art) clause in the image sizer.

The logic in Parsoid is around here: https://github.com/wikimedia/parsoid/blob/master/lib/ext.core.LinkHandler.js#L1062

...but I seem to recall some quantization done on height-based scaling as well, which I don't see here.

Change 238657 had a related patch set uploaded (by Tim Starling):
Fix denial of client-side upscaling in thumb and frameless format

https://gerrit.wikimedia.org/r/238657

Change 238658 had a related patch set uploaded (by Tim Starling):
Add imageinfo property "mustRender"

https://gerrit.wikimedia.org/r/238658

Change 238658 merged by jenkins-bot:
Add imageinfo property "mustRender"

https://gerrit.wikimedia.org/r/238658

Change 238657 merged by jenkins-bot:
Fix denial of client-side upscaling in thumb and frameless format

https://gerrit.wikimedia.org/r/238657

ssastry removed a project: Patch-For-Review.
ssastry removed a subscriber: gerritbot.

Rerunning visual diffing identified output diffs on [[Image: Flag of the British Army.svg|centre|180px]] with / without the batching api.

[subbu@earth api] echo '[[Image: Flag of the British Army.svg|centre|180px]]' | node parse --useBatchAPI
...
<img resource="./File:_Flag_of_the_British_Army.svg" src="./Special:FilePath/_Flag_of_the_British_Army.svg" height="180" width="180" data-parsoid='{"a":{"resource":"./File:_Flag_of_the_British_Army.svg","height":"180","width":"180"},"sa":{"resource":"Image: Flag of the British Army.svg"}}'/> 
...
[subbu@earth api] echo '[[Image: Flag of the British Army.svg|centre|180px]]' | node parse 
...
<img resource="./File:_Flag_of_the_British_Army.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/2/27/Flag_of_the_British_Army.svg/180px-Flag_of_the_British_Army.svg.png" data-file-width="675" data-file-height="450" data-file-type="drawing" height="120" width="180" data-parsoid='{"a":{"resource":"./File:_Flag_of_the_British_Army.svg","height":"120","width":"180"},"sa":{"resource":"Image: Flag of the British Army.svg"}} 
...

The rendering from non-batching API output matches the rendering from the output of the PHP parser on https://en.wikipedia.org/wiki/Royal_Engineers

Also '[[File:Panama National Anthem.ogg]]' on enwiki:Panama and '[[File: Arizona Fleming Elem FBISD.JPG|thumb|left|upright=1.7|alt=Photo of a school building with the lettering "Arizona Fleming Elementary".|Arizona Fleming Elementary School in Fort Bend ISD]]' on enwiki:Arizona_Fleming