Page MenuHomePhabricator

Parser interpretes <bXY> as <b> if XY begins with non-ascii character when $wgUseTidy=true
Closed, ResolvedPublic

Description

<bXY> is parsed as <b> if XY is a non-ascii character.
Examples (also included in the URL):
<b→> doesn't work! </b>
<bä> doesn't work! </b>
<boo> works fine </b>
URL: http://de.wikipedia.org/w/index.php?title=Benutzer:Church_of_emacs/Testseite&oldid=57126730&uselang=en


The parser misinterprets

*bar <s.baz@bar.com>

as

<li>bar <s></li>

Tested at https://test.wikipedia.org/w/index.php?title=Page479&diff=prev&oldid=220383


and originally seen at https://en.wikipedia.org/wiki/Special:Version/Credits/BetaFeatures where Siebrand's email-address is causing the issue.

@Legoktm noted: To reproduce you need to have $wgUseTidy = true;, but the parser doesn't actually have to call MWTidy::tidy()


See Also: T54022: <Sub-ID#1> is recognised as <sub> tag in <code> area

Details

Reference
bz17663

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:31 PM
bzimport set Reference to bz17663.
Tobias created this task.Feb 25 2009, 2:07 PM

EN.WP.ST47 wrote:

This is easy enough, but remind me exactly why <bXY> should be parsed as <b>?

cscott added a comment.Aug 6 2013, 3:22 PM
  • Bug 52022 has been marked as a duplicate of this bug. ***
cscott added a comment.Aug 6 2013, 3:23 PM
  • Bug 40670 has been marked as a duplicate of this bug. ***

Change 77907 had a related patch set uploaded by Cscott:
Non-word characters don't terminate tag names.

https://gerrit.wikimedia.org/r/77907

cscott added a comment.Aug 6 2013, 3:31 PM
  • Bug 52022 has been marked as a duplicate of this bug. ***
cscott added a comment.Aug 6 2013, 3:36 PM
  • Bug 40670 has been marked as a duplicate of this bug. ***

Change 77907 merged by jenkins-bot:
Non-word characters don't terminate tag names.

https://gerrit.wikimedia.org/r/77907

Patch merged. Closing as FIXED.

I was hoping to verify the fix on the deployed wiki. This patch hasn't been deployed yet. (Although it should happen today.)

Fixed in the sanitizer, but html-tidy appears to still have a bug.

See bug 52899 for a better way to document behavior which varies when tidy is being used. The bug has been reopened. Still need to fix tidy to ensure these tags aren't swallowed.

  • Bug 68127 has been marked as a duplicate of this bug. ***

We're hitting this issue at https://en.wikipedia.org/wiki/Special:Version/Credits/BetaFeatures.

I wonder if a work-around would be to use the HTML entity (&lt;) instead of "<". But ugh, fucking Tidy.

Quiddity updated the task description. (Show Details)Jan 8 2015, 10:16 PM
Quiddity set Security to None.

I think @Arlolra has fixed this, possibly?

Arlolra closed this task as Resolved.Mar 14 2015, 12:40 AM

Yup, this should be fixed with https://gerrit.wikimedia.org/r/#/c/183318/