Page MenuHomePhabricator

wikibase:GlobecoordinateValue decimal representation not in lexical form in WDQS.
Closed, ResolvedPublic

Description

It seems that using shorthand rather than a lexical form for decimal coordinates breaks (xsd schema) validation of the munged/split wikibase turtle dumps. Example:

wdv:d0a7604c8ae9777857887ac4f1807286 a wikibase:GlobecoordinateValue ;
	wikibase:geoLatitude 30.12684 ;
	wikibase:geoLongitude 120.25657 ;
	a wikibase:GeoAutoPrecision ;
	wikibase:geoPrecision 0.00027777777777778 ;
	wikibase:geoGlobe wd:Q2 .

This is a problem for loading this data into Virtuoso, and possibly other triple stores. The geodata decimals are serialized in lexical form if requested directly from wikibase, however.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The PRETTY_PRINT setting of the TurtleWriter is set to "true" by default. This causes the writer to only write the literal "label" without the datatype. This affects boolean, decimal, integer and double literals.

To fix make the following change (starting at line 623) in Munge.java:

final RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, lastWriter);
final WriterConfig config = writer.getWriterConfig();
config.set(BasicWriterSettings.PRETTY_PRINT, false);
handler = new PrefixRecordingRdfHandler(writer, prefixes);

Other default config settings are:

config.set(BasicWriterSettings.RDF_LANGSTRING_TO_LANG_LITERAL, true);
config.set(BasicWriterSettings.XSD_STRING_TO_PLAIN_LITERAL, true);

Change 284372 had a related patch set uploaded (by Smalyshev):
Set pretty printing to false for RDF writer

https://gerrit.wikimedia.org/r/284372

Change 284372 merged by jenkins-bot:
Set pretty printing to false for RDF writer

https://gerrit.wikimedia.org/r/284372

Smalyshev triaged this task as Medium priority.