Page MenuHomePhabricator

N-Triples encoding for RDF output uses invalid escaping
Closed, ResolvedPublic

Description

Example for Q42 (ru: "Дуглас Адамс")

Expected escaping:
"\u0414\u0443\u0433\u043b\u0430\u0441 \u0410\u0434\u0430\u043c\u0441"

Actual escaping produced by https://www.wikidata.org/entity/Q42.nt:
"\u00D0\u0094\u00D1\u0083\u00D0\u00B3\u00D0\u00BB\u00D0\u00B0\u00D1\u0081 \u00D0\u0090\u00D0\u00B4\u00D0\u00B0\u00D0\u00BC\u00D1\u0081"

It seems like UTF8 bytes get encoded as individual unicode characters, a kind of double-escaping. This is likely a problem with EasyRdf, but might have been fixed upstream already.

The discussion at https://github.com/njh/easyrdf/issues/175 seems related.
The relevant spec is http://www.w3.org/2001/sw/RDFCore/ntriples/#sec-issues.

Event Timeline

daniel created this task.Dec 5 2014, 12:32 PM
daniel raised the priority of this task from to Normal.
daniel updated the task description. (Show Details)
daniel changed Security from none to None.
daniel added a subscriber: daniel.
Smalyshev raised the priority of this task from Normal to High.Mar 12 2015, 7:16 PM
Tobi_WMDE_SW closed this task as Resolved.Mar 17 2015, 10:29 AM
Tobi_WMDE_SW claimed this task.
Tobi_WMDE_SW moved this task from Review to Done on the § Wikidata-Sprint-2015-03-11 board.