Page MenuHomePhabricator

Monitor the temporal falsifiability of people in Wikidata
Open, LowPublic

Description

The temporal falsifiability is a quantitative value between 0 and 1 that reflects the usefulness of the contemporary constraint on a certain entity set, although it's an indicator of data quality by itself. It's defined as 1-(c/(n^2)), where c is the number of potentially contemporary pairs of entities (including reflexive pairs) in a set of n entities. Wikidata data shouldn't reduce their temporal falsifiability, something that would mean its average data quality is declining. The temporal falsifiability can raise when any of these good things happen:

  • More data are added to existing entities.
  • We improve the precision of dates.
  • We create entities that are more complete and accurate than average.
  • Temporal biases are reduced (i.e., we get more data about the past and not so many about the present).

The value of temporal falsifiability of people in Wikidata, which should ideally be >0.30, and will never be able to reach 0.55, can be estimated by following these steps:

  1. Choose an n. For a very accurate estimation, this number can be the total number of instances of Q5; however, if using that n is too expensive, a lower n can be chosen.
  2. Initialize c := 0.
  3. Repeat n times:
    1. Pick two instances of Q5 randomly (see T194884), e1 and e2.
    2. If the date of birth (P569) of e1 can be lower than or equal to the date of death (P570) of e2 and the date of birth (P569) of e1 can be lower than or equal to the date of death (P570) of e1, then add 1 to c (c := c + 1); otherwise, do nothing. Here "can be" means that, in case there are several values, or values with low precision, it's enough to have a single possible combination that meets the rule. The rule is also met in case we don't have enough data; for instance, when checking if "the date of birth (P569) of e1 can be lower than or equal to the date of death (P570) of e2", this half of the rule is directly met if we don't have any date of death defined for e2.
  4. Calculate (n-1)*(n-c)/(n^2) to get the value of temporal falsifiability.

Please monitor on Grafana the value of temporal falsifiability of Q5 instances.

Event Timeline

abian triaged this task as Low priority.Dec 8 2018, 2:22 PM

I think this isn't urgent/important at all, but feel free to ask me anything in case you want to do it.