Html::expandAttributes can be tricked into omitting necessary quotes
Closed, ResolvedPublic

Description

When $wgWellFormedXml is false (default is true), Html::expandAttributes uses a regex ($badChars) having the /u modifier to check for characters that must be quoted as part of an attribute value.

Unfortunately, PCRE 8.32 (part of PHP 5.3.24 - 5.3.27, 5.4.14 - 5.5.4), will fail to match anything (returning false) if any noncharacter is anywhere in the input. Unicode has 66 noncharacters, though MediaWiki's UtfNormal library only checks for U+FFFE and U+FFFF. (Reading the changelog, I think PCRE 8.33 might have corrected this, though I haven't tested to be sure.)

http://3v4l.org/IRaf2
http://www.unicode.org/faq/private_use.html#nonchar4
http://www.pcre.org/changelog.txt

This can be used to carry out a cross-site scripting attack by injecting arbitrary code into the HTML tag; only &, ", \n, \r, and \t are escaped.

Steps to reproduce:

  1. Compile and install one of the listed PHP versions (or compile PHP against PCRE 8.32)
  2. Set $wgWellFormedXml = false; in LocalSettings.php
  3. Go to index.php?title=Special:BlockList&wpTarget=%EF%B7%90><script>alert(1)</script>

Version: unspecified
Severity: normal

Details

Reference
bz55548
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz55548.
bzimport changed Security from none to Software security bug.
Restricted Application changed the visibility from "Public (No Login Required)" to "Security (Project)". · View Herald TranscriptNov 22 2014, 2:22 AM
Restricted Application changed the edit policy from "All Users" to "Security (Project)". · View Herald Transcript

Created attachment 13472
minimal patch

Patch fixes usage of preg_match() in the specific location I identified (which I don't believe affects the WMF cluster). Of course I haven't audited/fixed every use of the preg_* functions in MediaWiki.

It is, however, slightly concerning that U+FDD0 becomes a plain question mark (U+003F) in at least IE 6.

Attached:

brion added a comment.Oct 11 2013, 6:47 PM

Why on earth would we leave quotes out in the first place????

That option should simply be removed; we should always output quotes.

I agree with Brion. The stated reason that, "If set to false, output will be a few bytes shorter, and the HTML will arguably be more readable" but I would rather we err on the side of caution and always quote them, which we do by default since Tim's patch in 2010.

We can either add Kevin's patch as the security fix, and then do a public change to always quote, or we could make that a security fix.

Tim, do you have a preference?

Added Krinkle too, in case he knows the history, or sees any potential issues with always quoting.

In addition to fixing the quoting, should all 66 noncharacters be blacklisted in
UtfNormal, despite http://www.unicode.org/L2/L2013/13006-nonchar-clarif.pdf,
which says "It is good practice to support all characters, including private use and noncharacter code points, except when those are used with a different
meaning internally"?

Created attachment 13811
patch to block noncharacters in UtfNormal

While this patch doesn't fix the actual quoting bug, it would have prevented me from exploiting it in the manner described (by filtering the problematic character out of the input).

To move this bug along a bit, I'm going to create and upload a patch to make Html unconditionally quote attribute values. Merely using a *blacklist* of characters that require quotes was not a good idea from a security point of view.

Attached:

Created attachment 13814
patch to always quote attributes

Attached:

Restricted Application changed the visibility from "Security (Project)" to "Custom Policy". · View Herald TranscriptNov 24 2014, 9:28 PM
Restricted Application changed the edit policy from "Security (Project)" to "Custom Policy". · View Herald Transcript
Bawolff added a subscriber: Bawolff.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2015, 10:48 PM

Created attachment 13811
patch to block noncharacters in UtfNormal

While this patch doesn't fix the actual quoting bug, it would have prevented me from exploiting it in the manner described (by filtering the problematic character out of the input).

Attached:

While this is probably a good idea just generally, from a security perspective I don't think its sufficient, since the output of lua modules don't go through utf-8 normalization.

Lets kill $wgWellFormedXml=false entirely

If any of the parts of $wgWellFormedXml = false turn out to be useful (Perhaps no / on self-closing elements - T110616), I would suggest that people could submit patches later to change it for all of MW, and those patches can be argued on their merits. Having two versions of how we serialize html is nuts.

To get things moving, I'd like to state as a (possibly lofty) goal, to have this feature removed in time for 1.27

@Nikerabbit: translatewiki.net seems to be the main user of this feature - Do you think they would have any objections to its removal from MediaWiki

Parsing-Team: Given that parsing team is the closest team to having responsibility for this area of code, do you guys have any objections to removing this feature?

Assuming that there's no objection from translatewiki.net or Parsing-Team, what I'd propose doing next:

  1. Send an email to wikitech-l proposing the feature be removed, citing basically the reasons above but without referencing any specific security issue
  2. If there's no objections, get F3900760 merged as a normal gerrit patch, so its on master, and anyone with an interest can object
  3. When T133070: MediaWiki 1.27.1 security release comes around, have it backported in a security release, and at that point release info about this security issue

I think that would serve as a compromise between giving people an opportunity to object if they really want to keep this feature, and maintaining confidentiality about the security issue. I wouldn't propose this course of action of the security bug had wide applicability, but this bug only applies to really old versions of php and MW in an unusual configuration.

Also, here's the proposed patch:


Its like @PleaseStand's but goes further (and is based on current master). One would also probably have to modify the parser tests for SyntaxHighlight_GeSHi extension too.

Thoughts?

Note that https://gerrit.wikimedia.org/r/#/c/286495/ which removes $wgWellFormedXml has been merged into master. (Bug still open because we also still have supported branches)

Bawolff moved this task from Backlog to Pending release on the Security board.Jul 10 2016, 5:29 PM
demon closed this task as Resolved.Aug 22 2016, 8:50 PM
demon claimed this task.
demon added a subscriber: demon.

Note that https://gerrit.wikimedia.org/r/#/c/286495/ which removes $wgWellFormedXml has been merged into master. (Bug still open because we also still have supported branches)

Resolved because it's been backported everywhere.

Can this be made public now?

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".Apr 30 2018, 12:55 PM
Bawolff changed the edit policy from "Custom Policy" to "All Users".

Can this be made public now?

Whoops.

Public now. Better years late than never I suppose.