Page MenuHomePhabricator

xml validation of generated rdf fails at !ENTITY declaration because of incorrect url coding in localized name of "Special"
Closed, ResolvedPublic

Description

Author: baratiza

Description:
MediaWiki 1.15.4
PHP 5.2.6-1+lenny8 (apache2handler)
MySQL 5.0.51a-24

Semantic MediaWiki (verzió: 1.5.0_0)

Steps to reproduce:

Install MediaWiki

!>>set Hungarian as the language of the wiki<<!

Install Semantic Mediawiki extensions

export rdf of the main page (I do it with "Semantic Radar" Firefox extension )

<link rel="alternate" type="application/rdf+xml" title="Speciális:Névjegy" href="/Z/index.php?title=Speci%C3%A1lis:ExportRDF/Speci%C3%A1lis:N%C3%A9vjegy&amp;xmlmime=rdf" />

result:
XML validation of generated RDF fails

details:

-Fatal Error Messages

FatalError: The parameter entity reference "%C3;" must end with the ';' delimiter.[Line = 7, Column = 56]

-The original RDF/XML document

1: <?xml version="1.0" encoding="UTF-8"?>
2: <!DOCTYPE rdf:RDF[
3: <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
4: <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
5: <!ENTITY owl 'http://www.w3.org/2002/07/owl#'>
6: <!ENTITY swivt 'http://semantic-mediawiki.org/swivt/1.0#'>
7: <!ENTITY wiki 'http://zolta.homelinux.org/Z/w/Speci%C3%A1lis:URIResolver/'>
8: <!ENTITY property 'http://zolta.homelinux.org/Z/w/Speci%C3%A1lis:URIResolver/Property-3A'>
9: <!ENTITY wikiurl 'http://zolta.homelinux.org/Z/w/'>
10: ]>
11:
12: <rdf:RDF
13: xmlns:rdf="&rdf;"
14: xmlns:rdfs="&rdfs;"
15: xmlns:owl ="&owl;"
16: xmlns:swivt="&swivt;"
17: xmlns:wiki="&wiki;"
18: xmlns:property="&property;">
19: <!-- Ontology header -->
20: <owl:Ontology rdf:about="&wikiurl;Speci%C3%A1lis:ExportRDF/KezdQlap">
21: <swivt:creationDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2010-07-02T16:18:23+02:00</swivt:creationDate>
22: <owl:imports rdf:resource="http://semantic-mediawiki.org/swivt/1.0" />
23: </owl:Ontology>


Version: unspecified
Severity: normal

Details

Reference
bz24234

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:11 PM
bzimport set Reference to bz24234.

baratiza wrote:

EXPECTED

urls in ENTITY declaration should be encoded with XML Character Reference
á
should be encoded
&#225;

line 7 and 8 should be

7: <!ENTITY wiki
'http://zolta.homelinux.org/Z/w/Speci&#225;lis:URIResolver/'>
8: <!ENTITY property

'http://zolta.homelinux.org/Z/w/Speci&#225;lis:URIResolver/Property-3A'>

Question

  1. Where in the code jungle is the line we need to change
  2. What PHP function does this encoding?

baratiza wrote:

-I guess the problem is in

$IP/extensions/SemanticMediaWiki/includes/export/SMW_Exporter.php


-maybe the line:
SMWExporter::encodeURI(urlencode(str_replace(' ', '_', $wgContLang->getNsText(SMW_NS_PROPERTY) . ':')));

karima.rafes wrote:

You can check your exportRDF with this form :

http://www.w3.org/RDF/Validator/

karima.rafes wrote:

I found a quick fix.
I added str_replace('%','&#37;',URL) in the file include/export/SMW_OWLExport.php in the function printHeader.

protected function printHeader( $ontologyuri = '' ) {

		global $wgContLang;

		$this->pre_ns_buffer .=
			"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" .
			"<!DOCTYPE rdf:RDF[\n" .
			"\t<!ENTITY rdf '"   . SMWExporter::expandURI( '&rdf;' )   .  "'>\n" .
			"\t<!ENTITY rdfs '"  . SMWExporter::expandURI( '&rdfs;' )  .  "'>\n" .
			"\t<!ENTITY owl '"   . SMWExporter::expandURI( '&owl;' )   .  "'>\n" .
			"\t<!ENTITY swivt '" . SMWExporter::expandURI( '&swivt;' ) .  "'>\n" .
			// A note on "wiki": this namespace is crucial as a fallback when it would be illegal to start e.g. with a number. In this case, one can always use wiki:... followed by "_" and possibly some namespace, since _ is legal as a first character.
			"\t<!ENTITY wiki '"  . str_replace('%','&#37;',SMWExporter::expandURI( '&wiki;' )) .  "'>\n" .
			"\t<!ENTITY property '" . str_replace('%','&#37;',SMWExporter::expandURI( '&property;' )) .  "'>\n" .
			"\t<!ENTITY wikiurl '" . str_replace('%','&#37;',SMWExporter::expandURI( '&wikiurl;' )) .  "'>\n" .
			"]>\n\n" .
			"<rdf:RDF\n" .
			"\txmlns:rdf=\"&rdf;\"\n" .
			"\txmlns:rdfs=\"&rdfs;\"\n" .
			"\txmlns:owl =\"&owl;\"\n" .
			"\txmlns:swivt=\"&swivt;\"\n" .
			"\txmlns:wiki=\"&wiki;\"\n" .
			"\txmlns:property=\"&property;\"";
		$this->global_namespaces = array( 'rdf' => true, 'rdfs' => true, 'owl' => true, 'swivt' => true, 'wiki' => true, 'property' => true );

		$this->post_ns_buffer .=
			">\n\t<!-- Ontology header -->\n" .
			"\t<owl:Ontology rdf:about=\"$ontologyuri\">\n" .
			"\t\t<swivt:creationDate rdf:datatype=\"http://www.w3.org/2001/XMLSchema#dateTime\">" . date( DATE_W3C ) . "</swivt:creationDate>\n" .
			"\t\t<owl:imports rdf:resource=\"http://semantic-mediawiki.org/swivt/1.0\" />\n" .
			"\t</owl:Ontology>\n" .
			"\t<!-- exported page data -->\n";

}

I have now implemented the above "quick fix" which I think is the proper way to solve the problem. The symbol % that is correctly used in the problematic URLs has a special meaning in XML ENTITY declarations and must be escaped in this way. The fix will be released with SMW 1.5.2.