Page MenuHomePhabricator

[Story] When a Quantity is entered with no uncertainty/bounds given, do not guess uncertainty/bounds until needed.
Closed, ResolvedPublic

Description

After quite a bit of discussion at T105623: [Task] Investigate quantification of quantity precision (+/- 1 or +/- 0.5) as well as in the Wikidata team, it appears best to not automatically determine uncertainty upon input, but to do so on the fly if needed. The rationale is two-fold:

  • There is no definitive rule for determining the significant digits from a decimal string (e.g. if the input was 1700, it's unclear whether there are two, three or four significant digits). Furthermore, there are no clear rules for determining the uncertainty interval from the number of significant digits (there is disagreement about whether 1.2 should be interpreted as having an uncertainty of +/-0.1 or +/-0.05). Which convention is most desirable may even depend on the use case, and should not be pre-determined.
  • Confusion arises when the input of "17" results in the output "17+/-1", causing users to erronously enter an uncertainty of +/-0, which is currently not shown in the output.

When a quantity was entered without uncertainty, we should:

  • show a fly-out during input telling the user how the value will be interpreted and normalized.
    • we may want to add an extra warning for odd edge cases like 1700, telling the user how they can make the uncertainty explicit (+/-100, 17e2, etc).
  • display the quantity with no uncertainty (as a corollary, an explicit uncertainty of +/-0 should always be shown)
  • store the quantity without bounds
  • represent the quantity without bounds in JSON and RDF output (for derived/normalized values, see below)
  • determine the uncertainty before doing arithmetics (e.g. for unit conversion)
    • converted units should be rounded using the implicit uncertainty, but should show the uncertainty interval only of explicit in the original value.
  • determine the uncertainty for use in the normalized value in JSON and RDF (for indexing/querying as intervals)

To migrate our existing knowledge base to the convention outlined above, we should do the following:

  • Convert values for properties that represent (exact) counts to the new numebr datatype (see T112247).
    • For Numbers, omit +/-0 for display.
  • Remove all(?) bounds that follow the old default of +/-1 (can be done by bot)

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Sounds like a good plan! On:

Remove all(?) bounds that follow the old default of +/-1 (can be done by bot)

is there any way to differentiate between those that were set by the default, and those that were set deliberately by the user? (I suspect that 99.9% are the former rather than the latter, but it might be worth checking if possible.)

daniel set Security to None.

is there any way to differentiate between those that were set by the default, and those that were set deliberately by the user?

No. The fact that we can't tell "implicit" from "explicit" uncertainty is exactly the problem we are trying to fix here.

We've discussed it extensively and will move forward in the new year.

Hopefully this can be implemented soon? It's a bit awkward seeing https://en.wikipedia.org/wiki/Square_Kilometre_Array described as "1±1 square kilometre"!

Change 302248 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Update DataValues Number to 0.8.0

https://gerrit.wikimedia.org/r/302248

Number of quantity values where upper and lower bounds are the same: 139988.

SELECT
#?item ?property ?statement
?upperBound ?lowerBound
WHERE {
  ?dataValueId wikibase:quantityUpperBound ?upperBound .
  ?dataValueId wikibase:quantityUpperBound ?lowerBound .
  FILTER (?upperBound = ?lowerBound) .
  #?statement ?valueRef ?dataValueId .
  #?item ?property ?statement
  #SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
#LIMIT 10

Typical examples can be seen here (including an odd "0.0" edge case): https://www.wikidata.org/wiki/Q18844537.

thiemowmde moved this task from Review to Done on the Wikidata-Sprint-2016-08-16 board.

There is nothing we can do in the current sprint about this ticket. Review is done. The patch just can't be merged before the subtask T142086: [Task] announce quantity changes is done.

Change 302248 merged by jenkins-bot:
Update DataValues Number to 0.8.1

https://gerrit.wikimedia.org/r/302248