Page MenuHomePhabricator

[Story] JSON should (optionally) contain expanded/normalized values.
Open, MediumPublic

Description

Wikibase follows the maxim that values should be stored as given in the original source. The notation should be normalized, but the value should remain unchanged, using the original unit of measurement, calendar, time zone, reference globe, etc.

However, for consumers of the data, it is convenient to have values normalized, converted to some form that makes it easier to compare values. The solution is to optionally include additional forms of the value next to the "datavalue" field in the snak structure in JSON output. The following forms should be supported:

  • datavalue-uri: Full URI form of external identifier internally represented as simple ID strings. URI form of referenced entities.
  • datavalue-normalized: Time values converted to UTC (gergorian); Quantities converted to base (SI) units.

At least datavalue-uri should be included per default in the output of Special:EntityData, since only this feature turns the output into true Linked Data. All supported expansions should probably be included in JSON dumps, for easy import/re-use. For API output, these should be optional.

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a project: Wikidata.
daniel subscribed.

We had a team-internal discussion on this topic today. Following there is the outcome of this meeting:
We pretty much agreed on these points:

  • Dumps: have normalized and original value always set (also if they are the same)
  • API: have a flag for optionally requesting the normalized value additional to the original value
  • If value could not be normalized, the normalized value will be explicitly set to “null” (API & dumps)
  • For identifiers and media-datatypes, “normalized values” would be “derived values”, e.g. media would contain a “filepage-url”, a “thumbnail-url”, and a “media-url” and for identifiers this would be “url” and “uri”

Open questions are:

  • simple-value for RDF is unclear when no normalization is possible and
  • is general unclear for quantities with units

This blocks "WikibaseDataModelSerialization with what we do with WikibaseLib" and although we have totally removed the Lib Serialization I would not consider this task complete in regards to WikibaseDataModelSerialization..

Jonas renamed this task from JSON should (optionally) contain expanded/normalized values. to [Story] JSON should (optionally) contain expanded/normalized values..Aug 15 2015, 12:23 PM
Jonas subscribed.

Does this still needs discussion?

Does this still needs discussion?

T89005#1242003

This blocks "WikibaseDataModelSerialization with what we do with WikibaseLib" and although we have totally removed the Lib Serialization I would not consider this task complete in regards to WikibaseDataModelSerialization..

So should we mark T73170 as resolved? This task blocks that in a more soft way because we needed to consider this task while doing that task.

Basically all of the blockers of T73170 are 'soft' blockers.
We should probably rename that task and all of its blockers

The task description should be modified thus:

Replace "UTC (gergorian)" with "Universal Time and Gregorian calendar".

UTC did not exist before about 1961 (the earliest listed conversion from UTC to UT1 on page 87 of Explanatory Supplement To the Astronomical Almanac 3rd ed., editors Urban & Seidelmann, is for January 1, 1961).

An open question is what to do about precisions. If a date is given as January 1, 1980, Central European Time, with a precision of one day, the usual interpretation would be the event occurred between 00:00 hrs and 24:00 hrs in that time zone. If the normalization only considers the beginning of the possible range, converts it to 23:00 December 31, 1979, and omits the time zone and precision, the result is a bit misleading.