Page MenuHomePhabricator

dcatap.rdf in dumps contains invalid data
Closed, ResolvedPublic

Description

After downloading dcatap.rdf from https://dumps.wikimedia.org/wikidatawiki/entities/ I tried to load it into RDF store and I get this:

java.util.concurrent.ExecutionException: org.openrdf.rio.RDFParseException: The prefix "xmlns" cannot be bound to any namespace explicitly; neither can the namespace for "xmlns" be bound to any prefix explicitly. [line 2, column 289]

Comparing it with old file that loaded fine, I see that the old file had:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:adms="http://www.w3.org/ns/adms#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#">
    <rdf:Description rdf:nodeID="_n42">

while the new one has:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:adms="http://www.w3.org/ns/adms#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#" xmlns:xmlns="">
    <rdf:Description rdf:nodeID="_n42" xmlns:rdf="">

Not sure what the extra empty parameters are supposed to do but doesn't look like they work properly.

Event Timeline

@ArielGlenn @hoo
Is this the same issue as T117534: DCAT-AP: XML produces invalid output with HHVM?

DCAT was recently moved over to HHMV but the issue may not actually have been fixed (per the task having been re-opened)

It should be back on php5 now, I went through all the misc dumps crons, made sure they all use the config setting for php in the dumps config, and that is php5. See https://gerrit.wikimedia.org/r/#/c/400692/

hoo claimed this task.

It should be back on php5 now, I went through all the misc dumps crons, made sure they all use the config setting for php in the dumps config, and that is php5. See https://gerrit.wikimedia.org/r/#/c/400692/

Why would you revert this without even talking to me? Did you even check if DCAT.php was updated on the server after my change was merged?

It should be back on php5 now, I went through all the misc dumps crons, made sure they all use the config setting for php in the dumps config, and that is php5. See https://gerrit.wikimedia.org/r/#/c/400692/

Why would you revert this without even talking to me? Did you even check if DCAT.php was updated on the server after my change was merged?

AFAIR I deployed the change and that ran fine for a short while. But given the snapshot hosts are no longer supposed to be converted to hhvm ever, everything was switched to using the default php (which is Zend 5.5 here… and soon even 7).