xml_parse doesn't expand internal entities sometimes
Closed, ResolvedPublic

Description

Libxml was changed a while back to not expand xml entities by default (preventing many XXE and DoS vulnerabilities by default). It seems like that either caused, or influenced another change, so that xml_parse in the zend engine doesn't expand entities either.

In our SVG filtering, we assume the xml is expanded while filtering, since Adobe seems to frequently output svg's with entities (so we can't prevent them categorically). There doesn't seem to be any options to force xml_parse to do the expansion.

Tim heard a recommendation to move to XMLReader in general, over xml_parse. We can pretty easily convert XmlTypeCheck to use XMLReader behind the existing interface.


Patch:

  • 1.24:
  • 1.23:
  • 1.19:

Affected Versions: (needed)
Type: xss
CVE: CVE-2015-2934

Related Objects

csteipp created this task.Feb 2 2015, 7:08 PM
csteipp updated the task description. (Show Details)
csteipp raised the priority of this task from to High.
csteipp claimed this task.
csteipp changed the visibility from "Public (No Login Required)" to "Custom Policy".
csteipp changed the edit policy from "All Users" to "Custom Policy".
csteipp changed Security from None to Software security bug.
csteipp added subscribers: tstarling, Aklapper, csteipp.

This mostly duplicates how XmlTypeCheck interacted with the filter callbacks, but uses XMLReader instead of xml_parse.

The differences between this and XmlTypeCheck are

  1. Internal xml entities always expanded (to fix this bug)
  2. When an external entity is encountered, XmlReaderTypeCheck will stop parsing (since libxml_disable_entity_loader is set, so the parser throws an exception) and set wellFormed = false, where xml_parse passed the unexpanded entity to the filtering callback.
  3. Minor whitespace differences in TEXT nodes, but our filtering doesn't rely on the whitespace being exact

@tstarling, does this look like the approach you had in mind? I could also replace XmlTypeCheck entirely, but this approach is about 35% slower (when run across all of my svg's), so I left the old version for cases where we just want to know if the xml is well formed, and don't care about filtering exact values.

Reported by Noam from beyondsecurity this morning. It looks like the same issue, and with the XmlReaderTypeCheck patch, the script tag is correctly interpreted, sent to the filter, and upload is prevented.

tstarling added a comment.EditedFeb 6 2015, 4:20 AM

Spelling error getAttrubtesArray -> getAttributesArray

Is wfSuppressWarnings() really needed around the whole of XmlReaderTypeCheck::validate()? Maybe we could just have an error suppressing wrapper around XMLReader::read().

Or better still, let's capture errors from XMLReader::read() using set_error_handler() and mark the input invalid if anything is captured. XMLReader::read() calls xmlTextReaderRead:

/**
 * xmlTextReaderRead:
 * @reader:  the xmlTextReaderPtr used
 *
 *  Moves the position of the current instance to the next node in
 *  the stream, exposing its properties.
 *
 *  Returns 1 if the node was read successfully, 0 if there is no more
 *          nodes to read, or -1 in case of error
 */

It returns -1 for all sorts of reasons, and HHVM's XMLReader::read() responds to a return value of -1 by just raising a warning and returning false. It concerns me because if the user can find some way to make libxml2 give an error halfway through the document, say due to limit exhaustion, the read loop in XmlReaderTypeCheck::validate() will mistake the error for the end of the document, and the second half of the document will be passed without validation.

I guess the stack depth check on line 231 is meant to prevent partially validated documents from getting through the filter, but do we really know how clients will respond to XML documents with multiple document elements? Or PIs after the end of the document element? libxml2 has its own well-formedness flag, which is not directly exposed by HHVM, but it looks like xmlTextReaderRead() will return -1 in response to it being false:

    if (reader->ctxt->wellFormed == 0) {
	reader->mode = XML_TEXTREADER_MODE_EOF;
        return(-1);
    }

    return(0);

So a check for a warning should be sufficient.

Otherwise looks fine.

Update to catch warnings.

csteipp added a subscriber: Anomie.Feb 10 2015, 7:52 PM

Re-ran this through all my svg's, and doesn't seem to create any unexpected issues with the filtering. Performance isn't great-- run times are 4-10x using XMLReader vs xml_parse (using 5.4/zend).

@Anomie, would you be able to sanity check this patch?

Added Beyond Security's report to the unit tests, and a test that external entities shouldn't be allowed.

Short answer: A few bits of funny whitespace ("$errno , $errstr", "set_error_handler ( "), but otherwise looks sane.

Other comments:

You're missing some optimization if no $filterCallback was passed to the constructor, but the only use does pass a callback so that could be left for later.

You have a comment in the default case "One of COMMENT, DOC, DOC_TYPE, ENTITY, END_ENTITY, NOTATION, or XML_DECLARATION", but you actually handle COMMENT earlier in the switch. There doesn't seem to be any particular reason to not let COMMENT use the default case.

I wonder if moving the set_error_handler()/restore_error_handler() back up to validateFromInput() would be a worthwhile speedup, and if so whether the possibility of unrelated warnings from callbacks causing false failures is worth worrying about.

If the slowdown is that bad and HHVM's xml_parse works right, it might be worth detecting whether xml_parse is going to recursively expand entities and choosing XmlReaderTypeCheck vs XmlTypeCheck based on that. Maybe something like this:

$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><!DOCTYPE foo [<!ENTITY test "expanded"><!ENTITY test2 "&test;">]><foo>&test2;</foo>';
$parser = xml_parser_create_ns( 'UTF-8' );
xml_set_character_data_handler( $parser, 'someFunction' ); // Unfortunately HHVM doesn't like a Closure here
xml_parse( $parser, $xml, true );
xml_parser_free( $parser );
// Look at what 'someFunction' got called with
// * "expanded" = Entities recursively expanded (HHVM)
// * "&test;"   = Only one level of expansion (Zend, apparently)
// * "&test2;"  = No expansion
// * multiple calls, or something else = WTF?

Thanks for the comments @Anomie, fixed most of them in

. I kept the COMMENT block separate just to keep the comment about not passing it explicit.

I reran these on hhvm, and performance for XMLReader is actually 25% faster for small svgs, and 7-40% slower for some sets of larger files, so I'm much less worried about it in production.

csteipp closed this task as Resolved.Feb 17 2015, 10:43 PM
bd808 moved this task from Done to Archive on the MediaWiki-Core-Team board.Feb 23 2015, 11:58 PM
csteipp updated the task description. (Show Details)Mar 19 2015, 8:25 PM

Started backporting

, and in light of the outcome of T85848, I think it would make more sense to just convert XmlTypeCheck instead of having both XmlTypeCheck and XmlReaderTypeCheck in parallel. There's more risk of breaking our 3rd party installs, but having 2 implementations is going to be a headache too, and on hhvm, XmlTypeCheck isn't safe to use at all.

- @Anomie, does that seem more sane?

Looks ok to me. Haven't tested.

csteipp updated the task description. (Show Details)Mar 27 2015, 7:01 PM

csteipp updated the task description. (Show Details)Mar 28 2015, 5:13 AM

Deployed this afternoon as part of,
21:10 csteipp: redeploy security patches to wmf22
21:02 csteipp: redeploy security patches to wmf23

csteipp changed the visibility from "Custom Policy" to "Custom Policy".Mar 31 2015, 12:36 PM
csteipp changed the visibility from "Custom Policy" to "Public (No Login Required)".Mar 31 2015, 9:14 PM
csteipp changed the edit policy from "Custom Policy" to "All Users".
csteipp changed Security from Software security bug to None.

Change 201017 had a related patch set uploaded (by CSteipp):
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201017

Change 201028 had a related patch set uploaded (by CSteipp):
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201028

Change 201038 had a related patch set uploaded (by CSteipp):
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201038

Change 201017 merged by jenkins-bot:
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201017

Change 201028 merged by jenkins-bot:
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201028

Change 201038 merged by jenkins-bot:
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201038

Change 201223 had a related patch set uploaded (by CSteipp):
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201223

Change 201223 merged by jenkins-bot:
SECURITY: Always expand xml entities when checking SVG's

https://gerrit.wikimedia.org/r/201223

csteipp updated the task description. (Show Details)Apr 9 2015, 10:49 PM