Page MenuHomePhabricator

Support a "year" type in charts
Closed, DuplicatePublic3 Estimated Story Points

Description

Follow up from T383109

Charts created from data sets that specify a range of years treat the years as numbers instead of dates, causing display issues like the one seen here: https://commons.wikimedia.org/wiki/Data:New_Zealand_Annual_Wine_Production.chart

image.png (585×955 px, 28 KB)

We should support an explicit "year" type to prevent ambiguity.

Event Timeline

@aude in T383109#10534395 you suggest solving this with a formatter, something like:

"xAxis": {
    "title": {
        "en": "Year"
    },
    "formatter": "year"
},

which could then be used for other types like temperature, percents, etc.

A thought I had after we discussed this was whether we should just be adding "date" as a Tabular data type instead. Then we would know to treat a specific series as a date instead of a number and automatically pass that on to eCharts. Essentially: solve this specific issue at the data level instead of in the chart definition. Other "formats" you mention like temperature or percents maybe do make sense at the definition level though.

CCiufo-WMF moved this task from Up Next to Sprint 16 on the Charts board.
CCiufo-WMF edited projects, added Charts (Sprint 16); removed Charts.

Ok replying to myself: I see the tabular data types match the types supported in JSON, so it probably wouldn't make sense to add something specific to dates there.

With the years as strings, it makes them treated as category data. It can work if the years are evenly spaced apart on the x axis.

Questions

For date values and formatting, this can come with a lot of complexity.

  • how do we support BC dates?
  • do we support other calendars (then Gregorian)?
  • do we support timezones?
  • more precise times alike hours and minutes?
  • what about broader time scales like geologic time scale https://en.wikipedia.org/wiki/Geologic_time_scale (where the years go back more than 10000 BC)?

Internal (and tabular data json) representations

  • do we use timestamps with some standard like ISO-8601? ISO-8601 supports years form 1583-9999, though an expanded format can be used for dates outside that range. (e.g. -0001 = 1 BC)
  • maybe we look to another standard / format like the WIkibase timevalue? The date representation (timevalue) in Wikibase is more complex, with supporting Gregorian and Julian calendars and allowing more imprecision. This could add too much complexity.
  • is there some other standard we can follow?
  • or do we use another representation?

https://doc.wikimedia.org/Wikibase/master/php/docs_topics_json.html#json_datavalues_time

https://www.mediawiki.org/w/index.php?title=Wikibase/DataModel#Dates_and_times

Internally, we still need to have the date stored as either a string or an integer (or some more complex json value).

If we supported just year values, then:

It could be an integer, with supporting negative numbers like -44 which is 44 BC

Or could be a numeric string with like "-0044" or "2025" (expecting it to be 4 digits).

Potentially, we could also support ISO-8601 timestamps and the chart definition could specify a date formatter.

Formatting

Then there is a formatting aspect of this.

I think a more simple approach like sticking to number representation could work and then allow the chart definition to specify a formatter. (e.g. we could have one for year, or for temperature Celsius, percent, or other units)

{
    "license": "CC0-1.0",
    "version": 1,
    "type": "line",
    "xAxis": {
        "title": {
            "en": "Year"
        },
        "formatter": "year"
    },
    "yAxis": {
        "title": {
            "en": "Wine produced (million litres)"
        }
    },
    "source": "New Zealand Annual Wine Production.tab"
}

With a formatter "year", the values could be either number or string (numeric):

"xAxis": {
    "title": {
        "en": "Year"
    },
    "formatter": "year"
},

also dates can be localized, especially if they are more precise than a year. That is something I think we would need to support.

For date formatting:

In Wikibase, there is date formatting code that leverages MediaWiki localization:

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/refs/heads/master/lib/includes/Formatters/MwTimeIsoFormatter.php

and there is more generic (or non-MediaWiki dependent) formatting in a separate composer package:

https://packagist.org/packages/data-values/time

then there is an API endpoint that exposes this formatting to the frontend:

https://www.wikidata.org/w/api.php?action=help&modules=wbformatvalue

rather than trying to duplicate formatting in both php and JavaScript.

If we support more complex dates, then suggest we try not to reinvent the wheel and see if any of the above would be helpful

With the years as strings, it makes them treated as category data. It can work if the years are evenly spaced apart on the x axis.

Thanks for explanation!

  • do we use timestamps with some standard like ISO-8601? ISO-8601 supports years form 1583-9999, though an expanded format can be used for dates outside that range. (e.g. -0001 = 1 BC)

May be relevant: T382826

Isn't a date type already supported?

For example, tab file lists dates in %Y-%m-%d format, but
the corresponding chart file shows it as %m/%d/%Y format (American style). So the data is interpreted and rendered.
Since it seems to use Date(...) in the front-end, it seems that it will interpret source data as UTC but show it in the local time zone of the browser.

I agree it would be nice to have explicit ISO-8601 support in the tab file, since that data isn't currently validated. It worrying that someone could save an invalid date to a tab file, without knowing that it breaks a chart.

Will a decision on this result in a new ADR or an amendment to ADR 0006?

Sorry for all my questions, I don't know a lot about the MediaWiki dev process.

While ISO 8601 is useful for specifying a single point in time, or a calendar day, it isn't immediately to me how I should make tabs/charts where I have monthly data, like e.g. this file. I errorneusly added -31 to the year-month, but now some entries are invalid, as there aren't 31 days in e.g. February. JavaScript's Date constructor will still accept the value, and infer it as March 1st, which might make sense. But it would have been better to reject the data before it entered the system.

CCiufo-WMF edited projects, added Charts (Sprint 17); removed Charts (Sprint 16).
CCiufo-WMF moved this task from Incoming to Doing on the Charts (Sprint 17) board.

Some ideas for a schema that is a bit more "flexible", but not all options need to be available from day one. I think the main idea here is that there are different "types" of formats with their own configuration options.

"xAxis": {
    [...]
    "format": {
        "type": "year",
        "calendar": "gregorian",
        "notation": "BCE"
    }
}
"xAxis": {
     [...]
    "format": {
        "type": "number",
        "notation": "compact",
        "decimals": 2
    }
}
"xAxis": {
     [...]
    "format": {
        "type": "temperature",
        "scale": "celsius"
    }
}
CCiufo-WMF renamed this task from Support a "date" type in charts to Support a "year" type in charts.Mar 4 2025, 7:27 PM
CCiufo-WMF updated the task description. (Show Details)
CCiufo-WMF added a subscriber: Jdlrobson-WMF.
Seddon set the point value for this task to 3.Mar 4 2025, 7:40 PM

Discussed with @aude and @bvibber in engineering time. We think this is best achieved by allowing editors to override our type inference by providing the type themselves and by adding support for "subtypes" that provide further modification.

For years, we recommend having a type represented by ["date", "year"]

@aude is going to focus on this ticket, and then we're evolve this to support T386028: Make number formatting in Charts configurable

After a team discussion today, we decided not to provide special formatting for years at this time. We will solve the issue of years being parsed as formatted numbers as part of T386028: Make number formatting in Charts configurable.