A File: link to a non-existing image can inject html
Closed, ResolvedPublic

Description

When a [[File:]] tag to an non-existent image is rendered, the comment is written into the dom without html escaping.

Reported by Writ Keeper:

Hi all,

I'm not sure if this is the right place to report this, but I figured
better safe than sorry. I marked a page for deletion today, called '
Atelje "Meander" ', that had an embedded Youtube video in it (not
through File Upload or Commons), and I didn't think that this was
supposed to be possible. It did this by putting an iframe tag in the
argument to a nonexistent file (File:Nikola Novaković na kontrabasu
Miše Blama), so that it read [[File:Nikola Novaković na kontrabasu
Miše Blama|<iframe src=(url) ...</iframe>]]. The page has been
deleted, but I reproduced the behavior (using an <img> tag instead of
an iframe) on my test account's sandbox, located here:
http://en.wikipedia.org/wiki/User:WK-test/sandbox .

Thanks,
Writ Keeper


Version: 1.19.1
Severity: major

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz39700.
csteipp created this task.Via LegacyAug 27 2012, 8:54 PM
tstarling added a comment.Via ConduitAug 27 2012, 10:30 PM

I've confirmed that the bug is present at least as far back as 1.17. Sanitizer::stripAllTags() is documented as returning unsafe HTML since MW 1.10, due to calling self::decodeCharReferences(). It is called by Parser::stripAltText(), and the result is passed through to Linker::makeBrokenImageLinkObj() without further processing. Before r82843, the main case would not have been vulnerable, but the linkKnown() case and the <!--ERROR--> case would have been vulnerable before that time.

csteipp added a comment.Via ConduitAug 28 2012, 1:27 AM

So looking at the best fix for this, the comments for Linker::makeBrokenImageLinkObj say that they expect the $html string to be htmlescaped. That's the closest place to the output where that expectation seems to be documented, so probably should escape $fp['title'] just before passing it in makeImageLink?

The comments in Parser::stripAltText seems (to me) to indicate the author thought Sanitizer::stripAllTags would strip out html tags, so it would be good to find some way of making that more obvious.

brion added a comment.Via ConduitAug 28 2012, 3:28 AM

Sanitizer::stripAllTags does strip out HTML tags -- but it returns *PLAINTEXT* which may include the characters "<" and ">", which might be around things like "script" or "iframe".

Every time you treat plaintext as HTML, you create an HTML injection vector... Escape your output, kids. :)

tstarling added a comment.Via ConduitAug 28 2012, 7:03 AM

(In reply to comment #2)

So looking at the best fix for this, the comments for
Linker::makeBrokenImageLinkObj say that they expect the $html string to be
htmlescaped. That's the closest place to the output where that expectation
seems to be documented, so probably should escape $fp['title'] just before
passing it in makeImageLink?

Something like that. My main concern is to avoid reopening bug 27679 and to avoid breaking the test cases fixed in r68491. Most link texts are wikitext, and in fact we have a parser test enforcing that interpretation:

[[File:Denys Savchenko ''Pentecoste''.jpg]]

Goes to:

<p><a href="/index.php?title=Special:Upload&amp;wpDestFile=Denys_Savchenko_%27%27Pentecoste%27%27.jpg" class="new" title="File:Denys Savchenko &#39;&#39;Pentecoste&#39;&#39;.jpg">File:Denys Savchenko <i>Pentecoste</i>.jpg</a>

Note the actual italics in the link text. So it may not be as simple as just adding more escaping.

(In reply to comment #3)

Every time you treat plaintext as HTML, you create an HTML injection vector...
Escape your output, kids. :)

Thanks for those words of wisdom, old man.

csteipp added a comment.Via ConduitAug 28 2012, 10:37 PM

As Tim mentioned, it looks like r82843 (which closed bug 27679) may have contributed to the problem. It prevented running htmlspecialchars twice when $html == '' (Line 884), but doesn't run htmlspecialchars on the $html variable if it is passed into the function (and htmlspecialchars wasn't run in 2 other cases as well). I made one patch to run htmlspecialchars exactly once on $html for each of the code paths.

I also tried running tooltext in Parser::stripAltText through Sanitizer::removeHTMLtags, since the comment for makeBrokenImageLinkObj says that $html should already be escaped, and I'm not entirely sure if there was a reason for it. Both fixes produced the same output for all of my test input.

Both fixes also now allows a link like [[File:AB2.gif | A<B ]] to render as <a href="/wiki/index.php?title=Special:Upload&amp;wpDestFile=AB2.gif" class="new" title="File:AB2.gif">A&lt;B</a> instead of <a href="/wiki/Wikipedia:Upload?wpDestFile=AB2.gif" class="new" title="File:AB2.gif">A</a>, which seemed to me like a bug.

Is there a preference for running htmlspecialchars vs. removeHTMLtags?

csteipp added a comment.Via ConduitAug 28 2012, 10:38 PM

Created attachment 11025
run htmlspecialchars exactly once on $html in makeBrokenImageLinkObj

attachment htmlspecialchars.patch ignored as obsolete

tstarling added a comment.Via ConduitAug 29 2012, 12:44 AM

(In reply to comment #5)

As Tim mentioned, it looks like r82843 (which closed bug 27679) may have
contributed to the problem. It prevented running htmlspecialchars twice when
$html == '' (Line 884), but doesn't run htmlspecialchars on the $html variable
if it is passed into the function (and htmlspecialchars wasn't run in 2 other
cases as well). I made one patch to run htmlspecialchars exactly once on $html
for each of the code paths.

Thanks for that. It appears to work, but I still want to look at it more closely.

I also tried running tooltext in Parser::stripAltText through
Sanitizer::removeHTMLtags, since the comment for makeBrokenImageLinkObj says
that $html should already be escaped, and I'm not entirely sure if there was a
reason for it. Both fixes produced the same output for all of my test input.

The comment on makeBrokenImageLinkObj() was added by someone who didn't understand the code. The only reason for that comment is that escaping happened to be missing at the time the comment was added, since the escaping was removed shortly before by another author to fix a double-escaping bug. It doesn't say anything about what callers expect or whether escaping is a good idea.

Is there a preference for running htmlspecialchars vs. removeHTMLtags?

Sanitizer::removeHTMLtags() is a parser helper that removes certain HTML tags and leaves others for further processing by subsequent parts of the parser. It's not usually called for other purposes.

Sanitizer::stripAllTags() converts HTML (or HTML-like half-transformed wikitext) to plain text for use in an attribute. It's assumed that whatever constructs the HTML will escape the attribute correctly. Linker::makeBrokenLinkObj() is an unusual case since text which was originally intended for the alt attribute is redirected into non-attribute HTML.

There's no point in running Sanitizer::removeHTMLtags() on the input to Parser::stripAltText(), since it has already been run. And it would be incorrect to run it on the output of Parser::stripAltText(), since that would prevent users from including special characters in alt tags, regardless of what escaping method they attempted to use:

print Sanitizer::stripAllTags('&#60;script/&#62;');

<script/>

tstarling added a comment.Via ConduitAug 29 2012, 4:01 AM

Created attachment 11032
Chris's patch plus some cleanup and tests

Chris's patch looked fine. I fixed the documentation regarding $html and the variable name. I also removed $prefix and $trail, since all callers have set them to empty strings since the Age of Legends, and leaving them in was unnecessarily confusing. I also added a parser test.

In case anyone cares: link trails were originally implemented in the link functions. Now we have link trail handling in LinkHolderArray::makeHolder() and link prefix handling in Parser::replaceInternalLinks2(), but Linker::splitTrail() was left where it was, and many Linker functions still take prefix/trail arguments.

Attached: bug39700-v3.patch

tstarling added a comment.Via ConduitAug 29 2012, 9:27 AM
  • Bug 39762 has been marked as a duplicate of this bug. ***
tstarling added a comment.Via ConduitAug 29 2012, 10:28 AM

Niklas blamed r61816 (MW 1.16), which seems plausible. At the time, only the linkKnown() case would have been vulnerable, so it would have only affected wikis with $wgEnableUploads = $wgUploadNavigationUrl = false. I've confirmed the vulnerability in 1.16 with those conditions.

r82843 (MW 1.18) would have extended the vulnerability to wikis with upload enabled. The commit message indicates that the author was misled by the vulnerability from r61816 into thinking that there was no need to escape $text, and his test cases didn't fully explore that assumption.

The error case was probably never reachable.

MW 1.18 is still within our 12 month support window, so we will need releases of both 1.18 and 1.19.

csteipp added a comment.Via ConduitAug 29 2012, 9:57 PM

Created attachment 11038
Backport to 1.19, which didn't call wfCgiToArray

Attached: encLink-1.19.patch

csteipp added a comment.Via ConduitAug 29 2012, 10:13 PM

Created attachment 11039
Backport to 1.18

Attached: encLink-1.18.patch

csteipp added a project: Security.Via WebMar 26 2015, 8:39 PM

Add Comment