Page MenuHomePhabricator

Port HTML5 default mode change (core 97caae596) to Parsoid
Closed, ResolvedPublic

Description

This is my interpretation of this bad rendering:
http://parsoid.wmflabs.org/frwikisource/Auteur:Abb%C3%A9%20Pierre

versus the good one:
http://fr.wikisource.org/wiki/Auteur:Abb%C3%A9_Pierre


Version: unspecified
Severity: normal

Details

Reference
bz54438

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:09 AM
bzimport set Reference to bz54438.

The parsoid.wmflabs.org link just times out; can you describe it?

It seems that somehow <time> tags are not recognized as HTML tags and parsed as text (encoded with HTML entities).

Here is the interesting part of the output for the example above (purged of Parsoid specific attributes - sorry for that):

Henri Grouès, dit l’abbé Pierre, était un prêtre catholique français, résistant puis député, fondateur du Mouvement Emmaüs</span> (&lt;time class,&lt;span&gt;,=,&lt;/span&gt;,"bday"='' datetime,&lt;span&gt;,=,&lt;/span&gt;,"1912"=''&gt;1912&lt;/time&gt;<link href=./Catégorie:Naissance_en_1912><link href=./Catégorie:Auteurs_du_XXe_siècle>– &lt;time class,&lt;span&gt;,=,&lt;/span&gt;,"dday"='' datetime,&lt;span&gt;,=,&lt;/span&gt;,"2007"=''&gt;2007&lt;/time&gt;<link href=./Catégorie:Décès_en_2007><link href=./Catégorie:Auteurs_du_XXIe_siècle>)

Can be verified on a simple test case:

[subbu@earth lib] echo "<time>foo</time>" | node parse --fetchConfig false
<body data-parsoid='{"dsr":[0,17,0,0]}'><p data-parsoid='{"dsr":[0,16,0,0]}'>&lt;time&gt;foo&lt;/time&gt;</p>
</body>

The sanitizer in Parsoid is the culprit -- it uses a list of whitelisted html tags to accept in wikitext and <time> is not one of them. Maybe our port of PHP sanitizer has a bug or we need to update our port. To be investigated.

Change 99011 had a related patch set uploaded by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/99011

Leaving this bug open until other parts of 97caae596 are ported too.

Change 99011 merged by jenkins-bot:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/99011

It seems to me, but not sure this is related to this bug fix, Parsoid generates an additional/unnecessary " " character after the closing time tag.

Examples:

(In reply to comment #7)

It seems to me, but not sure this is related to this bug fix, Parsoid
generates
an additional/unnecessary " " character after the closing time tag.

Examples:

This seems to work fine when testing with master:

echo '<time>1900</time>foo' | node parse
<body data-parsoid='{"dsr":[0,21,0,0]}'><p data-parsoid='{"dsr":[0,20,0,0]}'><time data-parsoid='{"stx":"html","dsr":[0,17,6,7]}'>1900</time>foo</p>
</body>

Can you try to find a minimal test case at http://parsoid.wmflabs.org/_wikitext/ ?

This patch was also deployed on Wednesday (see https://www.mediawiki.org/wiki/Parsoid/Deployments#Wednesday.2C_December_4.2C_13:00-14:00_PST_Y_Deployed_0ac82a28), so these tags are now supported in production.

The smaller example I was able to get with a difference is:

Your test proves probably that this "problem" has nothing to do with the original bug, should I open a new ticket?

(In reply to comment #9)

The smaller example I was able to get with a difference is:

http://parsoid-lb.eqiad.wikimedia.org/frwikisource/
Utilisateur%3AKelson%2Ftest

Your test proves probably that this "problem" has nothing to do with the
original bug, should I open a new ticket?

Yes, that would be great. This looks more like a template whitespace folding issue.

Here it is:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58289
Certainly not the most exciting bug to investigate...

I close the ticket at this bug seems to be fixed now. Thank you very much.

Reopened this bug, as there are still HTML5-by-default changes from 97caae596 to port.

Change 101277 had a related patch set uploaded by GWicke:
Merge "Bug 54438: First part of core change 97caae596: support time/data/mark elements"

https://gerrit.wikimedia.org/r/101277

Change 101329 had a related patch set uploaded by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/101329

Change 101277 merged by GWicke:
Merge "Bug 54438: First part of core change 97caae596: support time/data/mark elements"

https://gerrit.wikimedia.org/r/101277

Change 101329 merged by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/101329