Page MenuHomePhabricator

[RFC] Wikibase should specify the significance level of lowerBound and upperBound
Closed, InvalidPublic

Description

The documentation at:
https://www.mediawiki.org/wiki/Wikibase/DataModel#Quantities
says that:
"The exact interpretation of the uncertainty interval provided with lowerBound and upperBound is unspecified. Depending on context, it may represent hard limits on the value, or the interval may just describe the 66 or 95 percentile interval of a normal distribution."

This is problematic, as it depends on an implied context, rather than actually giving the uncertainty interval where is is available. Rather than relying on an implied context, it would be better to record the uncertainty level / significance level / number of standard deviations that the bounds are representing, where that is available. Most scientific publications will include a mention of this, so this should be widely available, and it should be possible to represent this in Wikidata's entries.

Event Timeline

Mike_Peel raised the priority of this task from to Needs Triage.
Mike_Peel updated the task description. (Show Details)
Mike_Peel added a project: Wikibase-DataModel.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
thiemowmde renamed this task from Wikibase should specify the significance level of lowerBound and upperBound to [RFC] Wikibase should specify the significance level of lowerBound and upperBound.Sep 2 2015, 9:24 AM
thiemowmde set Security to None.

The idea is that such additional information would be provided using the qualifier mechanism, not as part of the value itself. This is meant as a compromise between complexity and expressiveness of the data model. Being able to represent the different kinds of significance/precision/accuracy/etc in the data value itself would require a very complex data model. Shifting this to the Statement level allows us to use the flexible and powerful qualifier mechanism to cover this need.

The upper and lower bounds are done in the data model, though, not through a qualifier - why is that?

My worry with using a qualifier rather than building this into the data model is that it could imply that the significance level is for the main number/amount, rather than relating to the upper and lower bounds.

Consider an example, the mass of an electron:

9.109 383 56 x 10⁻³¹ kg ±0.000 000 11 x 10 x 10⁻³¹ kg where the uncertainty is ± one standard deviation. If we pretend that units had been implemented already, how would this be expressed in the database?

The example comes from http://physics.nist.gov/cgi-bin/cuu/Category?view=html&Frequently+used+constants.x=66&Frequently+used+constants.y=33

@Jc3s5h Your example could be expressed as:

Statement about mass:

  • Main Snak value: 9.109 383 56 x 10⁻³¹ ±0.000 000 11 x 10 x 10⁻³¹ kg
  • Qualifier for property "uncertainty": item "standard deviation"

In any case: the interpretation of the uncertainty relies on context, but not on implied context. It's possible to provide the context explicitly using qualifiers, or statements on properties. I'm closing this as invalid for now, since the implementation matches the spec in behavior and intent. If you feel there is a need to discuss this further, it's probably best to do that on the mailing list. If it's useful, we can then re-open this ticket to track the discussion.

daniel claimed this task.