Page MenuHomePhabricator

Underlying SimpleXMLElement::xpath fails for docs with default namespace xmlns
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:

Parse any xml doc with xmlns default namespace, e. g.

{{#get_web_data:url=https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/3028194/XML?heading=GHS+Classification|format=xml|use xpath|data=res=/Record/RecordType}}
res: {{#for_external_table:{{{res}}} }}

Actual Results:

Results are always empty, example:

res:

Expected Results:

Non-empty results, example:

res: CID

As a workaround, xmlns could be replaced with ns in the xml string in EDParserXMLwithXPath.php (see patch):

$text = str_replace('xmlns=', 'ns=', $text);

(see also comments on https://www.php.net/manual/de/simplexmlelement.xpath.php)

Event Timeline

alex-mashin triaged this task as Unbreak Now! priority.
alex-mashin added a subscriber: Yaron_Koren.

Change 675350 had a related patch set uploaded (by Alex Mashin; author: mashin):
[mediawiki/extensions/ExternalData@master] Ignore xmlns in XML to be indexed with XPath

https://gerrit.wikimedia.org/r/675350

On stackoverflow there are some suggestions for a more robust solution for this limitation of XPath1,
e. g. using regex or iterate over all nodes an check for default namespace declarations.
https://stackoverflow.com/questions/1245902/remove-namespace-from-xml-using-php

Change 675350 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Add the parameter 'default xmlns prefix' when querying XML with XPath

https://gerrit.wikimedia.org/r/675350

Parameter 'default xmlns prefix' is added to be used as default prefix for xmlns when no prefix is provided in the XML.

It is wiki user's responsibility to use this namespace prefix in his XPath queries.

A working example of 'default xmlns prefix':

{{#get_web_data:
   url=https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/3028194/XML?heading=GHS+Classification
 | format=xml
 | use xpath
 | data=res=/ns:Record/ns:RecordType
 | default xmlns prefix=ns
}}res: {{#for_external_table:{{{res}}} }}