Page MenuHomePhabricator

When Strings that end in "\n" get entered into the string interface Wikidata should simply strip the "\n" away instead of throwing an error
Closed, DuplicatePublic

Description

At the moment I'm using Google Chrome on Windows 10 to fill FMA data on https://www.wikidata.org/wiki/Q27050939. I take the FMA ID data from http://xiphoid.biostr.washington.edu/fma/fmabrowser-hierarchy.html?search=Anterior%20ramus%20of%20spinal%20nerve&entryPoint=none&extendHierarchy=false

I copy paste and get:

Could not save due to an error.
Malformed input: 8733

If I put the number in a box and again copy pasted it I can successfuly put it into the box. I guess there's some invisible formatting copied that prevents the data input from working immediately.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 29 2016, 11:13 PM

I think it might be do to the copied string ending in \n or something like this. It shouldn't be hard for Wikidata to accept the copy pasted string.

ChristianKl renamed this task from Malformed input error to When Strings that end in "\n" get entered into the string interface Wikidata should simply strip the "\n" away instead of throwing an error.Sep 30 2016, 1:10 PM
Esc3300 added a subscriber: Esc3300.EditedOct 2 2016, 1:19 PM

I'm not sure about this specific copy-paste, but I think it's a frequent problem when strings start or end with odd characters or even include CR in the middle.

It happens a lot when trying to create statements for article titles. Oddly, I think the same text for labels tends to work.

Instead of rejecting it completely, it would be helpful if a cleaned version was offered for saving instead.

daniel added a comment.Oct 2 2016, 4:35 PM

Please distinguish between

a) normalization applied by the API, when receiving a JASON structure representing a sting value.
b) normalization applied by the API when parsing user input, and returning a string value
c) normalization applied by the JavaScript widget before sending user input to the API

I believe we should do (b), but not (a). We can also do (c), but with (b) in place, this seems redundant.

The reason I believe we should not do (a) is that we are not dealing with user input then, but with a DataValue object. DataValue objects are either valid or they are not, we should not guess intent. Clients that supply DataValues should take care to apply any normalization they desire.

I agree that (b) is likely the way to go.