Page MenuHomePhabricator

PHP Notice: Undefined offset: 1 in ./includes/title/NamespaceAwareForeignTitleFactory.php on line 127
Closed, ResolvedPublic

Description

When importing an XML dump with importDump.php (only --wiki and file args specified, no --no-updates, etc), I sometimes get undefined offset errors:

2015-09-29 14:38:16 mw1 buswiki: [2352d243] [no req]   ErrorException from line 127 of /srv/mediawiki/w/includes/title/NamespaceAwareForeignTitleFactory.php: PHP Notice: Undefined offset: 1
#0 /srv/mediawiki/w/includes/title/NamespaceAwareForeignTitleFactory.php(127): MWExceptionHandler::handleError(8, 'Undefined offse...', '/srv/mediawiki/...', 127, Array)
#1 /srv/mediawiki/w/includes/title/NamespaceAwareForeignTitleFactory.php(78): NamespaceAwareForeignTitleFactory->parseTitleWithNs('Documentation', 828)
#2 /srv/mediawiki/w/includes/Import.php(979): NamespaceAwareForeignTitleFactory->createForeignTitle('Documentation', 828)
#3 /srv/mediawiki/w/includes/Import.php(718): WikiImporter->processTitle('Documentation', '828')
#4 /srv/mediawiki/w/includes/Import.php(555): WikiImporter->handlePage()
#5 /srv/mediawiki/w/maintenance/importDump.php(299): WikiImporter->doImport()
#6 /srv/mediawiki/w/maintenance/importDump.php(257): BackupReader->importFromHandle(Resource id #168)
#7 /srv/mediawiki/w/maintenance/importDump.php(102): BackupReader->importFromFile('/home/southpark...')
#8 /srv/mediawiki/w/maintenance/doMaintenance.php(103): BackupReader->execute()
#9 /srv/mediawiki/w/maintenance/importDump.php(304): require_once('/srv/mediawiki/...')
#10 {main}

There's no namespace key for the 828 (Module:) namespace in the XML file, I guess that's why this error gets thrown?

Event Timeline

Southparkfan raised the priority of this task from to Needs Triage.
Southparkfan updated the task description. (Show Details)
Southparkfan subscribed.

When importing an XML dump [...], I sometimes get undefined offset errors

Is there a reliable test case (link to XML dump)?

Yes, I believe more information is needed before we can progress here:

  • What XML dump you are exactly using, so we can investigate from there
  • What command you are running, so we can try to reproduce the problem
  • Any special extensions that you have enabled but isn't enabled on the actual wiki you are getting the dump of

And also what version of MediaWiki you are running?

Using Special:Export, I do see

<namespace key="828" case="first-letter">Module</namespace>
<namespace key="829" case="first-letter">Module talk</namespace>

Looks like you have an invalid dump. Notice that https://en.wikipedia.org/wiki/Special:Export/Module:Documentation has

<page>
    <title>Module:Documentation</title>
    <ns>828</ns>

Judging by the stack trace you posted in the description, the dump you are importing appears to have

<page>
    <title>Documentation</title>
    <ns>828</ns>

The import process expects pages with a nonzero namespace to contain a namespace prefix in their title.

@Southparkfan: Where did you get this dump from? Did you concoct it yourself? If you exported it from Special:Export, which version of MediaWiki is the source wiki running?

Error handling during import is basically non-existent: invalid dumps are almost guaranteed to produce obscure exceptions or errors like this. We ought to do better.

Aklapper changed the task status from Open to Stalled.Oct 9 2015, 8:15 AM

No replies for a week - setting task status to stalled. Please reset to open once answering.

Southparkfan changed the task status from Stalled to Open.Oct 9 2015, 10:06 PM

This is not just a bug that occurs with one random XML dump, and I'm just using "php importDump.php --wiki wiki /path/to/dump.xml".

Sometimes your dump might contain Scribunto modules, widgets (Extension:Widgets), and other pages that use custom namespaces. Like all other pages, they contain '<ns>' tags with the namespace ID (e.g. 828 for Scribunto). To ensure an import is properly done, the new wiki (where the dump will be imported) needs to have those namespaces enabled (by installing Widgets, for example), and "<namespace key" keys are in the XML file.

I wanted to dump an SQL version of a wiki into an XML file, so I imported the SQL file (which contained Scribunto modules), but there were no namespace keys in the XML file at all because I didn't enable the Scribunto extension (human error). On the wiki where I was going to import the XML dump the Scribunto extension was enabled, but because of T114115#1714262 and the missing keys (I wonder if the latter actually does matter in this case?), those PHP errors get thrown. Some validation checks before the import would be great I guess.

I'm not sure what to do with this task though, I'll leave that to you and Aklapper.

@Southparkfan: Yes, but where did this "one random XML dump" come from? (Sorry, I misread your comment.) That was my question to you in the previous comment. It appears that whatever is generating these dumps is generating it in a way that is incompatible with modern versions of MediaWiki.

If possible, could you upload a problematic dump here, along with an explanation of where it came from?

I highly doubt that the missing keys have anything to do with the actual bug, but they do lend credence to the idea that the dumps weren't generated by MediaWiki itself, or were possibly produced by an old version (be nice to know which one).

@Southparkfan: Yes, but where did this "one random XML dump" come from? (Sorry, I misread your comment.) That was my question to you in the previous comment. It appears that whatever is generating these dumps is generating it in a way that is incompatible with modern versions of MediaWiki.

If possible, could you upload a problematic dump here, along with an explanation of where it came from?

Here you are:

, ,
Everything was dumped (dumpBackup.php --wiki buswiki --full --logs) with MediaWiki 1.25. The first XML file was dumped before I ran update.php (to convert the SQL database from 1.24 to 1.25), and was also the XML file I used to import on the new wiki (this wiki is MediaWiki 1.25 too). The second file was dumped after I ran update.php, and the third file is the dump when Scribunto was enabled on the exporting wiki.

I highly doubt that the missing keys have anything to do with the actual bug, but they do lend credence to the idea that the dumps weren't generated by MediaWiki itself, or were possibly produced by an old version (be nice to know which one).
The SQL dumps were from the Orain wiki farm, which ran MediaWiki 1.24 on their wikis when I made the SQL backup of the wiki(s).

I have full access to the SQL/XML dumps etc., so it's no problem to test something if you'd need to.

Yeah, the problem lies in the fact that you didn't have Scribunto enabled when exporting the dump. Otherwise MediaWiki has no way of knowing what namespace 828 is supposed to be called. I don't think there is anything we can do about this on the export side of things.

There is something to be fixed here on the import side though: this type of invalid dump should not cause this kind of error. I'll work on a fix to improve the handling of this type of error.

What I am thinking is, if an XML fragment like

<page>
    <title>Documentation</title>
    <ns>828</ns>

is encountered during the import process, the page is put in namespace 828 if that namespace exists on the target wiki, otherwise the page is just dumped in the main namespace (or skipped altogether??)

the page is put in namespace 828 if that namespace exists on the target wiki

This is actually problematic. We decided that namespaces with IDs 100 and above would be compared by name only. This is because many of those namespaces will be wiki-specific custom namespaces, where comparing namespace IDs doesn't make sense [1]. But in this case, we don't have a namespace name, only an ID. So it looks like we will just have to dump all these unknown pages in the main namespace :(

[1] It makes sense to compare namespace IDs for extension-defined namespaces, but there seems to be no accurate way of telling whether a particular namespace is extension-defined...

Change 246815 had a related patch set uploaded (by TTO):
Handle missing namespace prefix in XML dumps more gracefully

https://gerrit.wikimedia.org/r/246815

Change 246815 merged by jenkins-bot:
[mediawiki/core] Handle missing namespace prefix in XML dumps more gracefully

https://gerrit.wikimedia.org/r/246815