Page MenuHomePhabricator

Preprocessor_DOM::newPartNodeArray returns invalid PPNode_DOM when given invalid UTF-8
Closed, ResolvedPublic

Description

User PerfektesChaos from dewiki has wrote a example lua module, which produces a fatal in the php preprocessor.

The module is:

  • Cause a server crash.
  • includes/parser/Preprocessor_DOM.php line 1692:
  • "Call to a member function item() on a non-object"
  • Crashes any 1.24wmf1 wiki, not only beta.wmflabs.org

local p = { }

function p.f( frame )

local story = "ö"                  -- non-ASCII char
local sub   = story:sub( 1, 1 )    -- ASCII substring at half of UTF-8
                                   -- should have been mw.ustring.sub()
return frame:callParserFunction( "#tag:nowiki", { sub } )

end -- p.f

return p

The error is:
includes/parser/Preprocessor_DOM.php line 1692: "Call to a member function item() on a non-object"

I have run it with warnings on on my dev machine and the output is:
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0x3C 0x2F 0x76 in Entity, line: 1 in \includes\parser\Preprocessor_DOM.php on line 85

Notice: Trying to get property of non-object in \includes\parser\Preprocessor_DOM.php on line 88

Fatal error: Call to a member function item() on a non-object in \includes\parser\Preprocessor_DOM.php on line 1692

See http://de.wikipedia.beta.wmflabs.org/wiki/MakeTheServerCrash for more information and a link to the example page.

Looks like the half byte is mangeled into the builded xml in the preprocessor and than it gives a invalid xml.


Version: 1.24rc
Severity: normal

Details

Reference
bz65081

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:12 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz65081.
Anomie added a comment.May 9 2014, 8:21 PM

You can reproduce it easier with just frame:callParserFunction( "#tag:nowiki", { "\128" } ). And it doesn't happen with PHP >= 5.4.0, since the default for htmlspecialchars changed such that it will ignore the invalid characters.

I think this is a bug in core, though, in that it should throw an exception rather than dying with a fatal error if given invalid input. I'm going to reassign and fix it accordingly.

Change 132503 had a related patch set uploaded by Anomie:
Preprocessor_DOM::newPartNodeArray should check that loadXML succeeded

https://gerrit.wikimedia.org/r/132503

  • Bug 65097 has been marked as a duplicate of this bug. ***

Change 132503 merged by jenkins-bot:
Preprocessor_DOM::newPartNodeArray should check that loadXML succeeded

https://gerrit.wikimedia.org/r/132503

was successfully merged