Page MenuHomePhabricator

Improve detection of missing quotes in HTML tag attributes
Open, LowPublic0 Estimated Story Points

Description

This edit removed some div tags, messing up the rendering because one of the <div>s has an unclosed quote! ==> <div style="clear:both; class="NavFrame">.
A user at de.wiki argues that these unintended changes shouldn't happen.

Event Timeline

Parsoid does what it can to match PHP parser's tokenizing and also handle different error scenarios. However, Parsoid will never be able to catch all possible erroneous markup and recover from them. The error recovery we add depends on how common those errors are in wikis and how much complexity it adds to the codebase. I think for low profile errors, it is reasonable to expect the markup to be fixed instead.

ssastry renamed this task from Removal of tags in VEdit to Improve detection of missing quotes in HTML tag attributes.Dec 14 2016, 3:58 PM
ssastry triaged this task as Low priority.

Change 361916 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] More permissive attribute name parsing

https://gerrit.wikimedia.org/r/361916

Change 361916 merged by jenkins-bot:
[mediawiki/services/parsoid@master] More permissive attribute name parsing

https://gerrit.wikimedia.org/r/361916

Arlolra subscribed.

The Parsoid bug that prevented tokenizing as an html element is fixed, but I'm leaving this open because maybe a linter category for these types of attribute syntax errors would be useful?