Page MenuHomePhabricator

wikibase/wdqs:0.3.40 - loadData.sh script cannot parse BNode with unlabeled white space
Closed, DeclinedPublic

Description

When loading a ttl via the loadData.sh script I get the following error when trying to load a blank node with white space between the brackets.

Could not load: url=file:///wdqs/wikidump-00000000
1.ttl.gz, cause=org.openrdf.rio.RDFParseException: Expected an RDF value here, found ']'

The loadData script does work correctly when I remove all white space between the brackets. Here is what the turtle looks like which causes the error:

wd:Q100301425 rdfs:label "Large Single-Handled Jar, Yale University Art Gallery, inv. 1930.655"@en ;
    wdt:P571 [ ] ;

According to the turtle docs

The [162s] ANON ::= '[' WS* ']' token allows any amount of white space and comments between []s. The single space version is used in the grammar for clarity.

Is there anyway to prevent this parse exception? I would consider it a bug.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This looks like a bug in the Rio parser (3rd party dependency). We'll need to dig a bit more into the code to confirm, but if that's the case, we're probably not going to fix it.

This is the only case we know of blank nodes being written with blank space. While this is valid, it's not very much used. If you can modify the dump format, that would be an easy workaround. This is definitely a bug in the parser library, and you might want to report this upstream.