Page MenuHomePhabricator

String values with leading/trailing whitespace are (inconsistently) reported as invalid
Open, LowPublic

Description

In Wikidata, when adding text to Property:(P)742 (pseudonym) it complains when there is a leading space rather than just ignoring and cropping it. This is already the behaviour for many other free text fields.

Event Timeline

Billinghurst renamed this task from Adding property field P742 (pseudonym) needs to ignore leading spaces to Adding property field Property:P742 (pseudonym) needs to ignore leading spaces.Mar 3 2018, 1:10 AM
Billinghurst renamed this task from Adding property field Property:P742 (pseudonym) needs to ignore leading spaces to Adding property field Property:(P)742 (pseudonym) needs to ignore leading spaces.
Billinghurst renamed this task from Adding property field Property:(P)742 (pseudonym) needs to ignore leading spaces to Adding property field Property:P742 (pseudonym) needs to ignore leading spaces.
Billinghurst updated the task description. (Show Details)

Hmm this is weird. It is nothing we configure per property. It might depend on the datatype. Can you provide a property where it works as you think it should?

It definitely works that way for description and aliases. Otherwise, I will have to dig through the free text fields to see where it happens as I come across them.

thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.
thiemowmde subscribed.

Labels, descriptions, aliases, as well as monolingual text values are trimmed. For monolingual text values this happens in the backend via MonolingualTextParser.

For string values this is not done. Instead a validator blocks strings with leading or trailing spaces, tabs, newlines, or any other vertical whitespace. The code for this can be seen in ValidatorBuilders::getCommonStringValidators.

We did not wanted to enforce trimming on string values because whitespace can be significant and meaningful. Think of https://www.wikidata.org/wiki/Property:P487, the string property holding a Unicode character. Shouldn't it be possible to add a statement to https://www.wikidata.org/wiki/Q380933 (the Item describing the space character) that holds the space character? Because of the validator this is currently not possible. I believe having this validator in place was a good thing. But nowadays we have plenty of alternatives, most notably constrain checks. Maybe it's time to lift the hard-coded limitation?

thiemowmde renamed this task from Adding property field Property:P742 (pseudonym) needs to ignore leading spaces to String values with leading/trailing whitespace are (inconsistently) reported as invalid.Mar 5 2018, 2:08 PM

Is this really good first task? Some problems that about labels and so on are having their contests on local projects.

Labels, descriptions, aliases, as well as monolingual text values are trimmed. For monolingual text values this happens in the backend via MonolingualTextParser.

For string values this is not done. Instead a validator blocks strings with leading or trailing spaces, tabs, newlines, or any other vertical whitespace. The code for this can be seen in ValidatorBuilders::getCommonStringValidators.

We did not wanted to enforce trimming on string values because whitespace can be significant and meaningful. Think of https://www.wikidata.org/wiki/Property:P487, the string property holding a Unicode character. Shouldn't it be possible to add a statement to https://www.wikidata.org/wiki/Q380933 (the Item describing the space character) that holds the space character? Because of the validator this is currently not possible. I believe having this validator in place was a good thing. But nowadays we have plenty of alternatives, most notably constrain checks. Maybe it's time to lift the hard-coded limitation?

4,5 years later: Oh yes, please do it! It is really annoying every time I enter monolingual text with a leading/trailing space to get an error message. This does not make sense at all. I have to remove the spaces by hand to save the edit. Either accept spaces or trim them.