Page MenuHomePhabricator

importDump.php fails on beta with "Extra content at the end of the document"
Closed, InvalidPublic

Description

tgr@deployment-mwmaint01:~$ mwscript importDump.php enwiki /home/tgr/dump-test.xml --no-local-users --username-prefix enwiki --report 50 --debug --dry-run
Wikimedia\NormalizedException\NormalizedException from line 711 of /srv/mediawiki/php-master/includes/import/WikiImporter.php: XML error at line 1: Extra content at the end of the document

#0 /srv/mediawiki/php-master/maintenance/importDump.php(354): WikiImporter->doImport()
#1 /srv/mediawiki/php-master/maintenance/importDump.php(286): BackupReader->importFromHandle(false)
#2 /srv/mediawiki/php-master/maintenance/importDump.php(130): BackupReader->importFromFile('/home/tgr/dump-...')
#3 /srv/mediawiki/php-master/maintenance/doMaintenance.php(108): BackupReader->execute()
#4 /srv/mediawiki/php-master/maintenance/importDump.php(359): require_once('/srv/mediawiki/...')
#5 /srv/mediawiki/multiversion/MWScript.php(116): require_once('/srv/mediawiki/...')
#6 {main}

(this is with https://gerrit.wikimedia.org/r/c/mediawiki/core/+/731261 applied; without it I get the even more nondescript Import failed Expected <mediawiki> tag, got. The XML file passes validation and works on my local (Vagrant) wiki, so it seems like this is somehow specific to Beta. Walking through the main steps with XMLReader in shell.php works as expected.

Event Timeline

Here's a simple test file:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.11/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.11/ http://www.mediawiki.org/xml/export-0.11.xsd" version="0.11" xml:lang="en">
</mediawiki>

This is a successful no-op run locally, but generates the error on Beta.

Possibly a duplicate of T259527: Special:Import failed at Wikispecies: "Import failed: Expected <mediawiki> tag, got". There are a bunch of Support Desk entries suggesting this can be caused by a character encoding issue, but the test file is pure ASCII.

Using stdin works:

tgr@deployment-mwmaint01:~$ cat /home/tgr/test-dump.xml | mwscript importDump.php enwiki --no-local-users --username-prefix enwiki --report 50 --debug --dry-run
Done!

Internally, the only difference is using fopen(<filename>) vs. fopen('php://stdin') to initialize ImportStreamSource.

D'oh. It turned out to be a file permission issue. Which does result in the appropriate warning locally - apparently those are somehow suppressed on beta? Nevertheless, the script should have proper error handling.

Change 731309 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] importDump.php: handle fopen error

https://gerrit.wikimedia.org/r/731309

Change 731309 merged by jenkins-bot:

[mediawiki/core@master] importDump.php: handle fopen error

https://gerrit.wikimedia.org/r/731309