Page MenuHomePhabricator

Parsoid does not emit different HTML when the page=# property is set on paged media (PDFs/DjVus/TIFFs)
Closed, ResolvedPublic

Description

See a demo of the problem here. https://www.mediawiki.org/wiki/User:Halfak_(WMF)/Page_link_demo

On the sandbox wikipage, you can see that in the first image link, the first slide of the PDF is rendered as a thumbnail. In the second link, page 33 of the PDF is rendered.

But on the Flow page, if you try to paste [[File:....pdf|thumb|right|300px|page=33]], the link will be changed to [[File:....pdf|thumb|right|300px|]] on preview or save. The |page=33 is removed and the first page of the PDF is rendered.

Related Objects

Event Timeline

Output below:

$ echo '[[File:Deploying and maintaining AI in a socio-technical system -- Research Showcase (August 2016).pdf|thumb|right|300px|page=33|xyz]]' | parse.js --useBatchAPI --fetchConfig false
...
<figure class="mw-halign-right" typeof="mw:Error mw:Image/Thumb" data-parsoid='{"optList":[{"ck":"thumbnail","ak":"thumb"},{"ck":"right","ak":"right"},{"ck":"width","ak":"300px"},{"ck":"page","ak":"page=33"},{"ck":"caption","ak":"xyz"}],"dsr":[0,134,2,2]}' data-mw='{"errors":[{"key":"api-error","message":{}}]}'><a href="./File:Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf" data-parsoid='{"a":{"href":"./File:Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf"},"sa":{},"dsr":[2,null,null,null]}'><img resource="./File:Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf" src="./Special:FilePath/Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf" height="300" width="300" data-parsoid='{"a":{"resource":"./File:Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf","height":"300","width":"300"},"sa":{"resource":"File:Deploying and maintaining AI in a socio-technical system -- Research Showcase (August 2016).pdf"}}'/></a><figcaption data-parsoid='{"dsr":[null,132,null,null]}'>xyz</figcaption></figure>
...

So, Parsoid is clearly recording info about the page parameter. But, in a VE edit, the parameter is being lost. So, probably a html -> wt issue.

ssastry triaged this task as Medium priority.Jan 6 2017, 10:43 PM

Change 342156 had a related patch set uploaded (by Arlolra):
[mediawiki/services/parsoid] T154709: Honour the "page" option for files

https://gerrit.wikimedia.org/r/342156

Jdforrester-WMF renamed this task from page=# flag does not work for PDFs in Flow messages to Parsoid does not emit different HTML when the page=# property is set on paged media (PDFs/DjVus/TIFFs).Mar 25 2017, 1:24 AM
Jdforrester-WMF added a subscriber: Schnark.

Change 342156 merged by jenkins-bot:
[mediawiki/services/parsoid@master] T154709: Honour the "page" option for files

https://gerrit.wikimedia.org/r/342156

Arlolra claimed this task.