Page MenuHomePhabricator

'redirect' XML tag is not correctly parsed during XML import
Closed, ResolvedPublic

Description

Author: sebastian.brueckner

Description:
Consider the following snippet of an XML dump created using Special:Export:

<mediawiki ...>
...

<page>
  <title>Abcde</title>
  <ns>0</ns>
  <id>27</id>
  <redirect title="Fghij"/>
  <revision>
    <id>111</id>
    <timestamp>2014-05-14T10:27:10Z</timestamp>

...

During import, the XML is parsed in WikiImporter::handlePage(). For all tags directly in <page> (like title, ns, id, ...) the info stored in the $pageInfo array is the node content ("Abcde", "0", "27" for the tags above). However, since <redirect is an empty tag, the value in $pageInfo is always an empty string (""). The actual information is stored in the title attribute though.

When accessing the $pageInfo array in hooks (e.g. ImportHandlePageXMLTag), the redirect title is not accessible, since it's not correctly parsed.

I will submit a fix on Gerrit and post the link here.


Version: unspecified
Severity: normal

Details

Reference
bz65481

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:22 AM
bzimport set Reference to bz65481.
bzimport added a subscriber: Unknown Object (MLST).

sebastian.brueckner wrote:

Here's my proposed fix: https://gerrit.wikimedia.org/r/134079

Change 134079 had a related patch set uploaded by TTO:
Correctly parse 'redirect' XML tag during Special:Import.

https://gerrit.wikimedia.org/r/134079

Change 134079 merged by jenkins-bot:
Correctly parse 'redirect' XML tag during Special:Import.

https://gerrit.wikimedia.org/r/134079