Page MenuHomePhabricator

When Tidy is enabled, wikitext like [[Link|<div>Text</div>]] will not actually produce a clickable link
Open, LowPublic

Description

When Tidy is enabled, wikitext like [[Link|<div>Text</div>]] will not actually produce a clickable link. That's because Tidy "unwraps" the <div/> outside of the <a/> tag.

I believe this is pretty much common knowledge, but it seems we have no bug filed for it and there clearly aren't enough Tidy-related bugs filed if no one has killed it yet.


Version: unspecified
Severity: normal

Details

Reference
bz71962

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:48 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz71962.
bzimport added a subscriber: Unknown Object (MLST).

The workaround is to use [[Link|<span style="display: block;">Text</span>]], as stupid as that looks.

Tidy's first rule is not to allow block-level elements inside an inline-level element. This bug describes what Tidy is supposed to do, albeit not what is desired. Not going to be fixed, because Tidy isn't HTML5-aware and hasn't been maintained since 2008.

The only solution is to switch to Tidy-html5 [1] (an experimental fork), or make its functionality part of parsoid. Either way, bug 54617 deals with updating/upgrading Tidy.

[1] https://github.com/w3c/tidy-html5

<a/> is not an inline element, it's a transparent element [1]. It was an inline element according to HTML 4 spec [2], but as far as I know no browser ever implemented it as such (even IE 6 happily allows block content in links). This is indeed an aspect of Tidy's incompatibility with HTML 5.

I don't think this deserves a WONTFIX any more than the other dozens of blockers to bug 2542 do.

[1] http://www.w3.org/TR/html5/text-level-semantics.html#the-a-element
[2] http://www.w3.org/TR/html401/struct/links.html#h-12.2

Wontfix, cantfix... the point is, there is no fix, save replacing the legacy Tidy. So how about marking all Tidy bugs dependent/duplicate of bug 54617?

Many of these could probably be worked around in MediaWiki if someone put in the necessary effort.

That would be an inordinate ammount of effort... Let's be realistic and just upgrade to Tidy-html5.

matmarex set Security to None.
Aklapper subscribed.
Izno subscribed.

This is still kind of a problem under Remex and friends.

The output HTML is funny--it outputs a duplicate, unfilled <a> outside and then a filled <a> inside, e.g. for input text [[Example|<div style="height: 300px; width: 400px; background-color: grey;">Example</div>]] we get:

<p>
  <a href="/wiki/Example" class="mw-disambig" title="Example"></a>
</p>
<div style="height: 300px; width: 400px; background-color: grey;">
  <a href="/wiki/Example" class="mw-disambig" title="Example">Example</a>
</div>

The inner a does function as expected.

I didn't expect that, but it actually makes sense. Assuming this wikitext:

[[Link|<div>Text</div>]]

Parser turns it into:

<p><a href="/wiki/Link"><div>Text</div></a></p>

Yeah, unfortunately, we have to remember about the <p> wrapping. And nesting <div> inside <p> is not allowed, so Remex splits the paragraph, and turns it into:

<p><a href="/wiki/Link"></a></p><div>Text</div>

It's a bit awkward, but you can avoid the issue by wrapping the entire thing in another <div> to prevent the automatic paragraph wrapping:

<div>[[Link|<div>Text</div>]]</div>

Parser turns it into:

<div><a href="/wiki/Link"><div>Text</div></a></div>

And Remex doesn't have to apply any fixes.


If we want plain [[Link|<div>Text</div>]] to work, then this becomes a parser bug again, we'd have to change the p-wrapping logic to avoid the wrap here. But I think that code is pretty brittle and best not touched :(

I was wondering if perhaps Parsoid handles this correctly if it's the p-wrap logic at fault.

I was wondering if perhaps Parsoid handles this correctly if it's the p-wrap logic at fault.

[subbu@earth:~/work/wmf/parsoid] echo "x [[Foo|<div>x</div>]] y" | php bin/parse.php
<p data-parsoid='{"dsr":[0,2,0,0]}'>x </p><a rel="mw:WikiLink" href="./Foo" title="Foo" class="mw-redirect" data-parsoid='{"stx":"piped","a":{"href":"./Foo"},"sa":{"href":"Foo"},"dsr":[2,22,6,2]}'><div data-parsoid='{"stx":"html","dsr":[8,20,5,6]}'>x</div></a><p data-parsoid='{"dsr":[22,24,0,0]}'> y</p>

So, as long as editors don't have unrealistic expectations of the 'x' and link and the 'y' to be all the same paragraph, Parsoid seems to handle this properly.

I was wondering if perhaps Parsoid handles this correctly if it's the p-wrap logic at fault.

[subbu@earth:~/work/wmf/parsoid] echo "x [[Foo|<div>x</div>]] y" | php bin/parse.php
<p data-parsoid='{"dsr":[0,2,0,0]}'>x </p><a rel="mw:WikiLink" href="./Foo" title="Foo" class="mw-redirect" data-parsoid='{"stx":"piped","a":{"href":"./Foo"},"sa":{"href":"Foo"},"dsr":[2,22,6,2]}'><div data-parsoid='{"stx":"html","dsr":[8,20,5,6]}'>x</div></a><p data-parsoid='{"dsr":[22,24,0,0]}'> y</p>

So, as long as editors don't have unrealistic expectations of the 'x' and link and the 'y' to be all the same paragraph, Parsoid seems to handle this properly.

I anticipate most people who wanted it would want it for the less paragraphy ^<a><div/></a>$ (to borrow regex).