Page MenuHomePhabricator

N-Triples encoding for RDF output uses invalid escaping
Closed, ResolvedPublic

Description

Example for Q42 (ru: "Дуглас Адамс")

Expected escaping:
"\u0414\u0443\u0433\u043b\u0430\u0441 \u0410\u0434\u0430\u043c\u0441"

Actual escaping produced by https://www.wikidata.org/entity/Q42.nt:
"\u00D0\u0094\u00D1\u0083\u00D0\u00B3\u00D0\u00BB\u00D0\u00B0\u00D1\u0081 \u00D0\u0090\u00D0\u00B4\u00D0\u00B0\u00D0\u00BC\u00D1\u0081"

It seems like UTF8 bytes get encoded as individual unicode characters, a kind of double-escaping. This is likely a problem with EasyRdf, but might have been fixed upstream already.

The discussion at https://github.com/njh/easyrdf/issues/175 seems related.
The relevant spec is http://www.w3.org/2001/sw/RDFCore/ntriples/#sec-issues.

Event Timeline

daniel raised the priority of this task from to Medium.
daniel updated the task description. (Show Details)
daniel changed Security from none to None.
daniel subscribed.
Smalyshev raised the priority of this task from Medium to High.Mar 12 2015, 7:16 PM
Tobi_WMDE_SW claimed this task.
Tobi_WMDE_SW moved this task from Review to Done on the § Wikidata-Sprint-2015-03-11 board.