Page MenuHomePhabricator

Unclosed "Enlarge" <a> tag from thumbnail is leaking on Special:BookSources
Closed, ResolvedPublic

Description

On https://en.wikipedia.org/wiki/Special:BookSources/0312898959, something is going wrong somewhere between SpecialBookSources, OutputPage, Parser, Tidy or Linker::thumbnail.

It contains the following HTML:

<div class="hatnote">For assistance, see <a href="/wiki/Help:ISBN" title="Help:ISBN">Help:ISBN</a>.</div>
<div class="thumb tleft"><div class="thumbinner" style="width:402px;">
  <div class="noresize">
    <map name="ImageMap_1_1744687168"><area href="#South_America" shape="poly" coords="90,79,70,91,96,160,115,162,135,100" alt="South America" title="South America"/><area href="#Africa" shape="poly" coords="176,49,159,53,141,68,156,95,187,143,222,143,230,83,200,56" alt="Africa" title="Africa"/><area href="#Europe" shape="poly" coords="179,4,153,36,158,49,201,55,208,42,203,14" alt="Europe" title="Europe"/><area href="#Europe" shape="poly" coords="203,14,239,3,333,15,307,36,285,32,263,34,251,34,225,33,208,35" alt="Russia" title="Russia"/><area href="#Asia" shape="poly" coords="207,34,198,55,222,77,254,88,284,104,306,108,314,94,333,18,301,34" alt="Asia" title="Asia"/><area href="#Australasia" shape="poly" coords="314,91,307,105,282,120,284,139,328,156,355,140,366,115" alt="Australasia" title="Australasia"/><area href="#United_States" shape="poly" coords="58,15,39,11,11,31,47,25" alt="United States" title="United States"/><area href="#United_States" shape="poly" coords="49,38,78,38,92,43,99,42,102,40,103,40,104,43,87,66,65,65,57,59,45,58,40,49" alt="United States" title="United States"/><area href="#Canada" shape="poly" coords="58,15,46,26,48,37,79,37,93,43,102,40,105,45,121,40,120,18,122,6,130,3,124,1" alt="Canada" title="Canada"/><area href="#Central_America" shape="poly" coords="45,57,57,58,65,66,75,71,74,74,67,79,47,70" alt="Central America" title="Central America"/><area href="#Greenland" shape="poly" coords="132,2,122,6,127,23,134,25,154,15,164,3" alt="Greenland" title="Greenland"/></map>
    <img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/BlankMap-World.png/400px-BlankMap-World.png" width="400" height="197" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/BlankMap-World.png/600px-BlankMap-World.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/BlankMap-World.png/800px-BlankMap-World.png 2x" data-file-width="1500" data-file-height="740" usemap="#ImageMap_1_1744687168"/>
  </div>
  <div class="thumbcaption"><div class="magnify"><a href="/wiki/File:BlankMap-World.png" class="internal" title="Enlarge"/></div>Select your region from the map above</div>
</div></div>
<div class="toc" style="float:right">
<div class="toctitle"><b>Brief Table of Contents</b></div>
<ol><li> <a href="#Notes">Notes</a></li>
<li> <a href="#Online_text">Online text</a></li>
<li> <a href="#Online_databases">Online databases</a></

The <a> for the Enlarge link in the div.magnify element is rendered as if it were a void element. Since anchor tags are not legal void elements, when parsed in Chrome, it stays open.

This is causing:

  • The entire caption to become part of the link.
  • The next heading to become part of the link
  • The whitespace before first item in the table of contents to become a link.
  • Because of the extra link, the first TOC item is now 1em offset to the right (see screenshot)

The same wikitext is also rendered on Wikipedia:Book_sources, however there it renders fine with an open and close tag:

<div class="magnify"><a href="/wiki/File:BlankMap-World.png" class="internal" title="Enlarge"></a></div>

Screenshot of the breakage on Special:BookSources:

special-broken.png (534×1 px, 208 KB)

Screenshot of Wikipedia:Book_sources by comparison, which is unaffected (possibly thanks to Tidy or some other post-processing).

wikipedia-fine.png (498×1 px, 198 KB)

Screenshot of live DOM, showing the unclosed <a> gets re-created in every block level element until it finds a way to close it. Which is causing whitespace that would otherwise be insignificant, to become significant and render, thus pushing away other content (like the first list item).

Screen Shot 2015-06-07 at 18.07.08.png (756×1 px, 214 KB)

The relevant code in Linker::makeThumbLink2:

				$zoomIcon = Html::rawElement( 'div', array( 'class' => 'magnify' ),
					Html::rawElement( 'a', array(
						'href' => $url,
						'class' => 'internal',
						'title' => wfMessage( 'thumbnail-more' )->text() ),
						"" ) );
			}
		}
		$s .= '  <div class="thumbcaption">' . $zoomIcon . $fp['caption'] . "</div></div></div>";

I tried many different ways, but Html::rawElement( 'a', array(), "" ); always produces <a></a>. Having said that, SpecialBookSources does to weird magic by including the raw wikitext directly into the page to be rendered, thus bypassing some processes. What's causing this?

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.
Krinkle triaged this task as Medium priority.Jun 7 2015, 5:11 PM
Krinkle set Security to None.

Here is a very simple way to reproduce it tossing it into any page. $wgTidyConfig is the default of null and $wgUseTidy is the default of false. I have not figured out a fix yet.

<imagemap>
File:Test.png|150|thumb|alt=Alt text|This should not? be a link.

default [[Main Page]]
</imagemap>

This definitely should not be a link.

It's caused by running the HTML through [[ http://php.net/manual/en/class.domdocument.php | DOMDocument ]] and getting it back out with [[ http://php.net/manual/en/domdocument.savexml.php | saveXML ]], rather than [[ http://php.net/manual/en/domdocument.savehtml.php | saveHTML ]].

It can be fixed by either not worrying about the supposed XHTML compliance and using saveHTML, or specifying the [[ http://php.net/manual/en/libxml.constants.php#constant.libxml-noemptytag | LIBXML_NOEMPTYTAG ]] option on saveXML, however this causes other issues with valid void tags, such as <img> becoming <img></img>. But at least that doesn't break the page...

Seems this is fixed, possibly since we switched to remexHTML

Screenshot 2019-02-04 at 13.44.14.png (940×1 px, 186 KB)

The HTML/XML confusion noted by @Majr and @Krinkle is resolved for good by https://gerrit.wikimedia.org/r/850313 (patch for T113791); the issue is not reproducible at the moment, so either Remex masked it as @TheDJ says or something changed in the template to avoid it.