Page MenuHomePhabricator

Import: XMLReader::open(): Unable to open source data
Closed, ResolvedPublic

Description

While importing from transwiki or upload a file I get the following warnings:

Warning: XMLReader::open(): Unable to open source data in \includes\Import.php on line 65
Warning: XMLReader::read(): Load Data before trying to read in \includes\Import.php on line 494

Special:Import shows: "Import failed: Expected <mediawiki> tag, got "

The return value of XMLReader::open is not checked in WikiImporter (in this case it is false) and therefore there is a first read on the xml file (which also returns false). After the first read is a check which expected a <mediawiki>, but nothing is read and therefore no tag is there. So error handling can be improved here.

The question is why open the source data fails. Special:Version shows "5.5.9 (apache2handler)" on a windows machine. I found some old imports on my wiki but there where all under 5.4 or 5.3.

I have added some displays to verify that the http request against dewiki is okay and gives xml back.
There is a tmpfile() call, maybe there needs to be some settings, but than mediawiki should give better messages.

Event Timeline

Umherirrender raised the priority of this task from to Needs Triage.
Umherirrender updated the task description. (Show Details)
Umherirrender added a subscriber: Umherirrender.
Aklapper triaged this task as Low priority.Jan 7 2015, 9:09 PM
TTO added a subscriber: TTO.Jan 12 2015, 5:17 AM

Ah yes, this old chestnut.

I suspect this happens because a prior import attempt fails halfway through (e.g. fatal error). At the beginning of each import, libxml_disable_entity_loader is called, but because the request didn't complete, and the entity loader doesn't get re-enabled. When using the Apache SAPI, the entity loader setting seems to persist across requests.

Similar issue to T58439, although there is probably a nicer way to fix this - we should just call libxml_disable_entity_loader( false ) before attempting to call XMLReader::open. Not sure of the security implications of this, though.

As a workaround, restart Apache to make importing work again.

Restart works and a call to libxml_disable_entity_loader( false ) before XMLReader::open works also.

Change 184554 had a related patch set uploaded (by TTO):
Enable entity loader and handle errors nicely in WikiImporter constructor

https://gerrit.wikimedia.org/r/184554

Patch-For-Review

daniel added a subscriber: daniel.Feb 11 2015, 5:53 PM

Possibly relevant patch fixing the "Unable to open source data" error for some cases: I31c014df39aa11c11ded700

In T86036#969663, @TTO wrote:

Ah yes, this old chestnut.
I suspect this happens because a prior import attempt fails halfway through (e.g. fatal error). At the beginning of each import, libxml_disable_entity_loader is called, but because the request didn't complete, and the entity loader doesn't get re-enabled. When using the Apache SAPI, the entity loader setting seems to persist across requests.
Similar issue to T58439, although there is probably a nicer way to fix this - we should just call libxml_disable_entity_loader( false ) before attempting to call XMLReader::open. Not sure of the security implications of this, though.

This is pretty much not an option from a security perspective. This potentially gives any uploader arbitrary command execution on the server, in addition to being able to read /etc/passwd, or any other file that apache has access to.

Can someone explain why remote entities are desirable in imports?

TTO added a comment.Feb 11 2015, 11:35 PM

@csteipp I think you're missing the point somewhat. See my comment at the patch. (Sorry for fragmenting the discussion, I was working through my e-mail queue in oldest to newest order!)

Change 184554 merged by jenkins-bot:
Enable entity loader and handle errors nicely in WikiImporter constructor

https://gerrit.wikimedia.org/r/184554

TTO closed this task as Resolved.Apr 11 2015, 11:24 PM
TTO claimed this task.