Page MenuHomePhabricator

Differences in Parsoid HTML compared to PHP parser output breaks javascript that is tailored ot the PHP parser output
Open, MediumPublic

Description

If I compare the html from
https://fr.wikipedia.org/wiki/Lyon
and
https://fr.wikipedia.org/w/api.php?action=visualeditor&format=json&paction=parse&page=Lyon&oldid=136092841
There are differences.
For example, look for the file Metropole_of_Lyon_map-locator-blank-2015.svg and it's surrounding HTML.

The difference in my example make the GeoBox_Init() from common.js to fail in the version from the api.

Event Timeline

To add a bit of context information: We need that information to get an offline version of the page in which the javascript works too.

ssastry renamed this task from differences in html served by wikipedia API to Parsoid doesn't add JS modules to <head> in its output.Apr 6 2017, 2:11 PM
ssastry triaged this task as Medium priority.
ssastry added a subscriber: ssastry.
This comment was removed by ssastry.

About the new title, I would like to point that my issue is that this span (for example) :

<span typeof="mw:Image" data-mw='{"caption":"Voir sur la carte administrative de la&lt;span typeof=\"mw:Entity\" data-parsoid=&#39;{\"src\":\"&amp;amp;nbsp;\",\"srcContent\":\" \",\"dsr\":[5765,5771,null,null]}&#39;> &lt;/span>Métropole de Lyon"}'>

is returned by the api but is not in the source of https://fr.wikipedia.org/wiki/Lyon

ssastry renamed this task from Parsoid doesn't add JS modules to <head> in its output to Differences in Parsoid HTML compared to PHP parser output breaks javascript that is tailored ot the PHP parser output.Apr 6 2017, 7:26 PM

Oops .. sorry, I was hasty there and moving too fast. I renamed the title to reflect what I understand now.

But, this is a known difference in Parsoid and PHP parser output. We do plan to adapt PHP parser image output to be similar to Parsoid output. For now, I don't have an immediate solution for you besides making changes to the JS code to handle Parsoid image markup properly. So, this would require a change to the JS code.

@Skylsmoi May you please tell what exactly breaks (or making in not working well) the Javascript running on the French article about "Lyon"? I heard from you that the problem was a missing "alt" attribute but here it seems you talk about a surnumerous "spam" node? Could you confirm this here and maybe share the problematic DOM node (both API and Parsoid)?

Here are the 2 html versions for the maps of the "location" part of the top right .infobox_v2

<div style="position:relative;;">
    <a href="/wiki/Fichier:Metropole_of_Lyon_map-locator-blank-2015.svg" class="image" title="Voir sur la carte administrative de la&#160;Métropole de Lyon">
        <img alt="Voir sur la carte administrative de la&#160;Métropole de Lyon" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/280px-Metropole_of_Lyon_map-locator-blank-2015.svg.png" width="280" height="371" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/420px-Metropole_of_Lyon_map-locator-blank-2015.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/560px-Metropole_of_Lyon_map-locator-blank-2015.svg.png 2x" data-file-width="966" data-file-height="1280" />
    </a>
    <div style="position:absolute;top:47.437844611528%;left:41.486805555555%;width:0px;height:0px;margin:0;padding:0;line-height:0px;background-color:transparent;">
        <div style="position:relative;top:-8px;left:-8px;width:16px;height:16px;background-color:transparent;">
            <a href="/wiki/Fichier:City_locator_14.svg" class="image">
                <img alt="City locator 14.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/16px-City_locator_14.svg.png" width="16" height="16" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/24px-City_locator_14.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/32px-City_locator_14.svg.png 2x" data-file-width="16" data-file-height="16" />
            </a>
        </div>
        <div style="position:relative;top:-16px;">
            <div style="font-size:90%;position:relative;top:-0.65em;left:-12.6em;text-align:right;width:12em;line-height:1.2em;">
                <span class="toponyme">Lyon</span>
            </div>
        </div>
    </div>
</div>

Here, the <img> tag contains a 'alt' attribute which is used by the js to generate the links to switch maps

<div style=\"position:relative;;\">
    <span typeof=\"mw:Image\" data-mw='{\"caption\":\"Voir sur la carte administrative de la&lt;span typeof=\\\"mw:Entity\\\" data-parsoid=&#39;{\\\"src\\\":\\\"&amp;amp;nbsp;\\\",\\\"srcContent\\\":\\\"\u00a0\\\",\\\"dsr\\\":[5765,5771,null,null]}&#39;>\u00a0&lt;/span>M\u00e9tropole de Lyon\"}'>
        <a href=\"./Fichier:Metropole_of_Lyon_map-locator-blank-2015.svg\">
            <img resource=\"./Fichier:Metropole_of_Lyon_map-locator-blank-2015.svg\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/280px-Metropole_of_Lyon_map-locator-blank-2015.svg.png\" data-file-width=\"966\" data-file-height=\"1280\" data-file-type=\"drawing\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/560px-Metropole_of_Lyon_map-locator-blank-2015.svg.png 2x, //upload.wikimedia.org/wikipedia/commons/thumb/6/68/Metropole_of_Lyon_map-locator-blank-2015.svg/420px-Metropole_of_Lyon_map-locator-blank-2015.svg.png 1.5x\" height=\"371\" width=\"280\"/>
        </a>
    </span>
    <div style=\"position:absolute;top:47.437844611528%;left:41.486805555555%;width:0px;height:0px;margin:0;padding:0;line-height:0px;background-color:transparent;\">
        <div style=\"position:relative;top:-8px;left:-8px;width:16px;height:16px;background-color:transparent;\">
            <span class=\"mw-default-size\" typeof=\"mw:Image\">
                <a href=\"./Fichier:City_locator_14.svg\">
                    <img resource=\"./Fichier:City_locator_14.svg\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/16px-City_locator_14.svg.png\" data-file-width=\"16\" data-file-height=\"16\" data-file-type=\"drawing\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/32px-City_locator_14.svg.png 2x, //upload.wikimedia.org/wikipedia/commons/thumb/5/5d/City_locator_14.svg/24px-City_locator_14.svg.png 1.5x\" height=\"16\" width=\"16\"/>
                </a>
            </span>
        </div>
        <div style=\"position:relative;top:-16px;\">
            <div style=\"font-size:90%;position:relative;top:-0.65em;left:-12.6em;text-align:right;width:12em;line-height:1.2em;\">
                <span class=\"toponyme\">Lyon</span>
            </div>
        </div>
    </div>
</div>

As you can see, the <a> tags are wrapped in <span> tag containing data but the <img> tag does not contains the 'alt' attribute

Mentioned this in the other ticket, but check out the android app for JS that works with Parsoid

Change 802643 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Use caption as alt on imgs when not present and caption isn't visible

https://gerrit.wikimedia.org/r/802643

Change 803606 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Set alt in galleries, despite caption being visible

https://gerrit.wikimedia.org/r/803606

Change 803606 merged by jenkins-bot:

[mediawiki/core@master] Set alt in galleries, despite caption being visible

https://gerrit.wikimedia.org/r/803606

Change 804404 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add forward compatibility to avoid serializing alt from caption

https://gerrit.wikimedia.org/r/804404

Change 804404 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add forward compatibility to avoid serializing alt from caption

https://gerrit.wikimedia.org/r/804404

Change 805225 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a12

https://gerrit.wikimedia.org/r/805225

Change 805225 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a12

https://gerrit.wikimedia.org/r/805225

Change 802643 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Use caption as alt on imgs when not present and caption isn't visible

https://gerrit.wikimedia.org/r/802643

Change 808051 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a14

https://gerrit.wikimedia.org/r/808051

Change 808051 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a14

https://gerrit.wikimedia.org/r/808051