Page MenuHomePhabricator

[Bug] QuantityParser must pass-through valid unit representations
Open, MediumPublic

Description

QuantityParser does have code to accept a unit in two ways: in the input string and via an unit option. Most of this code exists since forever. Support for the option was added in https://github.com/DataValues/Number/pull/21 (T77981) and refined in https://github.com/DataValues/Number/pull/23 to properly handle conflicts between the input string and the option. If there is a conflict, the parser refuses to accept the input.

All this is there and just works.

It was always possible to enter "5 foo" and/or set the unit option to "foo" and the parser successfully turned this into { "amount": "+5", "unit": "foo", ... } and passed it to the formatters and validators. Before https://gerrit.wikimedia.org/r/235512 (T111171) we never cared how these unit strings looked. They were just saved.

This changed when we decided that units can not be arbitrary strings but must be URIs (with "1" being the only exception). But even at this point it was still not the job of the parser to decide if a string can be accepted as a unit or not. That's the job of the validator, which finally existed at this point and did not accept anything but wikidata.org concept URIs. But the parser was not updated accordingly and still accepted arbitrary strings but no URIs. This caused a series of actual issues:

  1. The user could still type "5 foo" in the UI and the save button would be enabled, but when he hit save the unit was rejected by the validator because it now only accepts URIs. This was patched in https://github.com/DataValues/Number/pull/46 by simply removing that part of the parser's regex.
  2. But when attempting to enter a valid URI the validator is fine with, the users input can not even make it that far. It's blocked by the parser because of the outdated regex.
  3. At the same time the expert extender already allows copy-pasting concept URIs. So the user already can paste "5" and "http://www.wikidata.org/entity/Q1" in two input fields but not "5 http://www.wikidata.org/entity/Q1" in one input field.

That's a bug as simple as it could be: QuantityParser must pass-through valid unit representations. There are literally no preconditions to allow this. The fact that units are URIs will never change. The fact that URIs start with http:// or https:// will most probably never change. Everything beyond that (most notably support for resolving "meter" and "m" into concept URIs) is unrelated and can be build on top of this without challenging the decision that the parser passes URIs through.

Asking if we, as a team, want to allow users to work with concept URIs is like asking if a browser vendor wants to allow users to copy-paste URLs into the address bar. I refuse to accept this as an argument.

Patches for review:

Event Timeline

thiemowmde raised the priority of this task from to Medium.
thiemowmde updated the task description. (Show Details)

I think the following assertions made in the description need discussion:

The fact that units are URIs will never change. The fact that URIs start with http:// or https:// will most probably never change.

This is true for wikibase, but Quantity and QuantityParser are not defined by wikibase - they are part of the datavalues/number component. Do we want to limit units to URIs for all users of the component?

Changing QuantityParser to also accept URLs as units in the input isn't that problematic, though I find it a bit strange. There is no need for the syntax accepted by the default parser to cover every valid Quantity.

Asking if we, as a team, want to allow users to work with concept URIs is like asking if a browser vendor wants to allow users to copy-paste URLs into the address bar.

I think discussing whether we want to support/encourage input of the form "123 https://www.wikidata.org/wiki/Q828224" would be prudent.

Overall, accepting additional forms of input isn't a problem, as long as it doesn't introduce ambiguities. But I think it's worth to at least talk about. If we make this part of our public interface, it's one (small) bit more to maintain compatibility with.

Do we want to limit units to URIs for all users of the component?

This sounds like the opposite of what this ticket is proposing. It suggests to open, not to limit.

ambiguities

I do not understand. What could be less ambiguous than a URI?

Do we want to limit units to URIs for all users of the component?

This sounds like the opposite of what this ticket is proposing. It suggests to open, not to limit.

That was not clear from the proposal, since you said you assume that units would always be uris. That is true internally, but in the input, we would like to be able to use "km" for kilometer, etc.

ambiguities

I do not understand. What could be less ambiguous than a URI?

The ambiguity is what part of the input string is detected as the url. What regex would we use to match units (both, unit names and uris)? Where and how will that regex be defined?

This touches on the question if and how we will support localizable names for units - these would also need to be covered by the regex.

I'm not strongly opposing the idea of parsing unit URLs from the input. I'm reluctant, because it strikes me as a strange and confusing user experience, and i don't see a need for it. My point is: It's a change in behavior which we should have a good reason for. It's not a bug fix.

(Patch was closed years ago; removing corresponding project tags)