Page MenuHomePhabricator

Parsoid blindly uses srcTagName even if node name has changed
Open, Needs TriagePublic

Description

(previously discussed with @ssastry who figured out that it was caused by srcTagName)

If you have wikitext like <FONT color="blue">foo bar</FONT>, Parsoid will store "srcTagName":"FONT" in the data-parsoid so that it roundtrips correctly.

However, you modify the tag name in the Parsoid HTML, e.g. <span style="...">foo bar</span>, but leave the data-parsoid alone, then when converting it back to wikitext, Parsoid will use the value in srcTagName and then overwrite the new tag name with the old one.

This is a bit of an edge case but one my bot is running into when fixing obsolete-tag lint errors. I think the simple fix would be to check that the value in srcTagName still matches the node name before using it.

Event Timeline

Change #1230117 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] html2wt: Don't use data-parsoid.srcTagName blindly

https://gerrit.wikimedia.org/r/1230117

Change #1230117 merged by jenkins-bot:

[mediawiki/services/parsoid@master] html2wt: Don't use data-parsoid.srcTagName blindly

https://gerrit.wikimedia.org/r/1230117

Change #1233217 had a related patch set uploaded (by OSleger; author: OSleger):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a13

https://gerrit.wikimedia.org/r/1233217

Change #1233217 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a13

https://gerrit.wikimedia.org/r/1233217