Page MenuHomePhabricator

Investigate adding and removing space when string is reported as "Malformed input", then allowing save
Closed, ResolvedPublic

Description

In order to understand why the case described in T261071 we should do a little investigation.
This will allow us to properly understand the current state and also figure out where we want to end up.
The description of that ticket includes the full steps to reproduce.
The outcome of this investigation would be a clear description of what is happening in Wikibase and the UI that results in the edit eventually being allowed.
This could be compared with editing another value type which has a different set of API calls and steps as part of the edit process.

timebox for 2h

Event Timeline

Hidden white spaces seem to be causing the problem:

image.png (382×712 px, 130 KB)

Will try trimming the input in the front-end as a possible solution.

This is the file that needs to be modified:
https://gerrit.wikimedia.org/g/data-values/value-view/+/98597119b08d89f91c4f6021d6dae6ac25d8e8e4/src/experts/StringValue.js#66

This looks like it has been investigated fully. moving to done.

Moving this back to verification because we didn't actually sit down as discussed to go over what the overall desired enstate should be.

I think I know the answer to one question: why does this issue only affect the data type string, not e. g. external ID or URL? Because string is the only data type we parse in JavaScript.

For every other data type, we parse the value in JavaScript by sending it to the wbparsevalue API and thereby calling the PHP parser. The PHP parser for strings is fairly simple, but one thing it does is remove leading and trailing whitespace, such as the \r\n that are causing the problem. (Another thing it does is Unicode normalization.) The JS parser doesn’t do this, so when the Wikibase frontend tries to send a string value that was parsed in JS (and therefore still has the \r\n) to the backend, there’s an error because the whitespace should have been removed. My proposed solution is to stop using the JS StringParser and parse strings via PHP too, like all other data values.

I have some idea about another question: why does this issue go away if you add and remove a space elsewhere in the string? Because ValueView already removes line breaks from the input.. It subscribes to the eachchange event and replaces all line breaks (DOS or Unix) with the empty string. Then why does the bug exist at all? That’s the part I don’t understand… at the HTML level, the \r\n is gone immediately after paste, but I guess it must be persisted somewhere in JS until the next time the input changes.

Investigation is done and parent ticket is also done and closed