Page MenuHomePhabricator

[RFC] Create a "number" datatype for exact values
Open, MediumPublic

Description

There seems to be agreement that a "number" datatype would be useful for properties with an export count, like the number of electrons in an element, the number of seats in a parliament, etc. The number datatype would (probably) use QuantityValues for storage. It should have the following properties:

  • integers only
  • not negative
  • no unit
  • default to +/- 0 (open: should other precisions be allowed?)

If the precision is *always* +/- 0, we could use the simpler DecimalValue or even NumberValue class for storage, instead of QuantityValue. But that would also mean we can't use this type for some use cases (e.g. population figures)

Datatype on Wikidata Query Service should be xsd:integer.

On Wikidata: https://www.wikidata.org/wiki/Help:Data_type#Integer_datatype

Event Timeline

daniel raised the priority of this task from to High.
daniel updated the task description. (Show Details)
daniel added subscribers: Aklapper, Jc3s5h, Denny and 19 others.

Exact numbers can occur not only through counting, but also through definition. For example, as explained by the US National Geodetic Survey, one yard is defined as exactly 0.914 4 meter. One could imagine properties in Wikidata where most of the instances are measured, and have an associated uncertainty, but a few instances are defined, and so are exact. So unless it can be proven that a certain property is always uncertain, or always exact, the property should be designed to accept either the proposed number type, or QuantityValue.

If it isn't possible or practical to make properties take either kind of number, I don't see any choice but to use QuantityValue with amount = lowerBound = upperBound.

@Jc3s5h Quantity values would continue to work as they do now: they would allow exact as well as uncertain values. The new Number type would be more restrictive. Whether Number values (counts) should always be considered exact, or only be considered exact by default, remains to be decided. But for the use case you mention (values that derive from definitions - like yards in meter, miles in kilometers , but also the speed of light in m/s, which is exact per the definition of meter) we need non-integer values, and would use Quantity values with +/-0.

I wonder what the complexity of this is -- without being able to read the code.

If it's a simplification of existing code, it might be an interesting volunteer project.

If you announce it now as an upcoming change, it might even be possible to implement it in a fairly short period of time.

Lydia_Pintscher lowered the priority of this task from High to Medium.Nov 17 2016, 11:39 AM

With the changes we've made just now for precision handling I wonder if this is really still needed.

Query Service has xsd:integer for these and converting just impacts performance.

With the new behavior of the Quantity type, the difference to Number would still be:

  • positive integers only
  • no unit (no "dog" as unit for a dog team size)
  • default (or force?) to +/- 0 (don't use +/-0 with population numbers!)

As Esc3300 correctly noted, this would allow optimization for the RDF representation:

  • we can use xsd:integer instead of xsd:decimal for the literal type
  • if we do not allow any precision other than +/-0, we need no "complex value" nodes to represent the value in RDF. A single literal would be sufficient.

Note however that these optimizations are only possible if we introduced a new value type for numbers (different structure), not just a new data type (different interpretation). Changing the structure in JSON and RDF would be a breaking change, and transitioning existing values present in the database would be challenging.

This comment was removed by Izno.