Page MenuHomePhabricator

Time-Parser should detect most likely calendar
Closed, ResolvedPublic1 Estimated Story Points

Description

see https://bugzilla.wikimedia.org/show_bug.cgi?id=70395#c5


Version: unspecified
Severity: normal
Whiteboard: u=dev c=backend p=0

Details

Reference
bz73272

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:49 AM
bzimport set Reference to bz73272.
bzimport added a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:29 PM

Looking at the title of this task, "Time-Parser should detect most likely calendar", I think it should be explicitly stated what time parser(s) this task will consider; the time parser(s) used to take date input from users who employ the user interface, the time parser(s) used to display dates to users who employ the user interface, the time parser(s) used to store dates that are input using an API, and the time parser(s) used to export dates to JSON (and any other supported output format).

A fault with the user interface input parser is that it does not allow the entry of dates that exist in the Julian calendar but not the Gregorian calendar, such as February 29, 1700.

@Jc3s5h Parsers are used only for values entered by users into the user interface (processed via the wbparsevalue API module). Parsing is never applied to anything supplied via JSON, or stored in the internal data blob. Parsing is also not part of the display/rendering process - that is done by formatters (the fact that the mediawiki "parser" is at the same time the renderer for wikitext is an unfortunate accident; generally, parsing and formatting/rendering should be separate, and an abstract structure should be used for internal representation - in the case of Wikibase, that intermediate structure is JSON).

As to the question which calendars are considered: we currently only support "Gregorian" and "Julian". The automatic guess is based on the year: anything since 1582-01-01 would default to Gregorian, anything before that would default to Julian.

If we start supporting more calendar models, we would probably try the parser for each calendar, and go with the first one that can parse the input.

thiemowmde subscribed.

The guessing algorithm must be in line with the formatter we use to make full round trips possible.
https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/lib/includes/formatters/HtmlTimeFormatter.php#L80
A full round trip can be when a user copy-pastes a value (or transcribes it from an other browser window). Currently this does not survive.

The calendar model is never shown when the precision is month or less precise. We agreed that this is a good thing and should not be changed.

If a year <= 1581 is marked as Gregorian and a user copy-pastes this, it will magically turn into Julian, causing data-loss.

Next steps:

  • Find out what the actual edge is. Some code did <= 1581, other code did < 1583.
  • Re-evaluate and possibly re-implement the logic from the old UI: https://github.com/wmde/ValueView/blob/0.2/src/experts/TimeValue.js#L66
  • All relevant formatters must be changed to always show the calendar model if it's not identical to the auto-detected value.
  • Implement auto-detection.

@Jc3s5h: Please report individual problems as individual tasks. I created T98194 for the February 29 issue.

You are right about the potential data-loss when copy&pasting the data back. A couple of thoughts about that:

  1. While allowing full round trips without loss is nice, it's not something we are absolutely committed to. In particular, the "pretty" HTML display of our complex value types (time, geo, quantity) are all lossy (geo loses the reference globe, time loses before/after, etc).
  2. Since the display of the calendar model is omitted in cases where the calendar model isn't relevant (precision of year of larger), getting the calendar model wrong wouldn't be much of a problem. It would be nicer to have a "Grego-Julian-whatever" model for such cases, but I'm afraid that would introduce more issues than it solves.

Anyway. I agree that we could change the heuristic of when to show the calendar model so that the calendar is always shown when it's not the default (and maybe in some other cases too). That should fix the round trip issue.

I agree that we should probably change the "edge" to 1583-01-01. The switchover-date was in October, closer to 1583-01-01 than to 1582-01-01. In reality however we just don't know what calendar was used for dates given around 1600; it was probably simply a mess.

The one thing I really disagree with is the we should re-implement the old heuristic exactly. In particular, dates before 1582 (resp 1583) should not default to Gregorian, no matter what their precision is.

daniel moved this task from Review to Done on the Wikidata-Sprint-2015-05-05 board.