Page MenuHomePhabricator

Define syntax for defining and embedding a chart
Closed, ResolvedPublic3 Estimated Story Points

Description

We need to decide what the syntax for defining a chart looks like. We also need to decide how charts are embedded in articles.

Provisional recommendation:

via internal notes https://docs.google.com/document/d/1Wu7dlDmLhReglmh9pNXrt6YvYGHIp46mgyyVQrt7YQc/edit

A #chart: parserfunction will accept two Data: namespace parameters, one for the format definition (a Data:....chart page) and the other an optional tabular page (Data:....tab) to allow reusing the same definition with many data sets:

{{#chart:format=Weather monthly history.chart
|data=ncei.noaa.gov/weather/Detroit.tab}}

In cases where only a single data set is used and it's defined in the format description, it can be omitted in the invocation.

{{#chart:}} parser function parameters:

  • format specifies the Data:.chart page with the format definition. If it is left out, a default line graph will be emitted with labels from the tabular data.
  • data specifies the Data:.tab page with the source data. In the future this could point to other subtypes of Data: pages such as an encapsulated SPARQL query to Wikidata, hence using “data” rather than “table”. If it is left out, a default data file specified in the chart format will be rendered.

Tabular data structure is documented at https://www.mediawiki.org/wiki/Help:Tabular_Data

Recommend a similar JSON layout for the Data:.chart pages and their localizable text strings.

Here’s a sample election chart with its inline data definitions taken out and template parameters reworked a bit; note that the xAxis* and yAxis* params have been moved into sub-objects. Text has been extended to be localizable in the same format used for Data:.tab pages, which will fill out some column titles by default.

Invocation:

{{#chart:format=1993 Canadian federal elections.chart}}

Format description in Data:.chart page (see full list of params to come):

{
    "version": 1,
    "type": "line",
    "width": 350,
    "height": 200,
    "xAxis": {
        "title": "",
        "angle": -40,
        "type": "date"
    },
    "yAxis": {
        "title": {
            "en": "%support",
            "fr": "%soutien"
        }
    },
    "legend": {
        "en": "Party",
        "fr": "Parti"
    },
    "interpolate": "basis",
    "showSymbols": true,
    "colors": [ "#9999FF", "#EA6D6A", "#F4A460", "#87CEFA", "#3CB371", "#FF00FF" ],
    // Types and column titles are specified in the .tab
    // you can override the data source via invocation params
    // to reuse the format on different data sets, but this one
    // will be used in previews of the format page or if you
    // don’t specify in invocation
    "source": "1993 Canadian federal election.tab"
}

The matching Data.tab page would be:

{
    "license": "CC0-1.0",
    "description": {
        "en": "1993 Canadian federal election",
        "fr": "Élections fédérales canadiennes de 1993"
    },
    "schema": {
        "fields": [
            {
                "name": "date",
                "type": "string",
                "title": {
                    "en": "Date",
                    "fr": "Date"
                }
            },
            {
                "name": "pc",
                "type": "number",
                "title": {
                    "en": "PC",
                    "fr": "PC"
                }
            },
            {
                "name": "liberal",
                "type": "number",
                "title": {
                    "en": "Liberal",
                    "fr": "Libéral"
                }
            },
            {
                "name": "ndp",
                "type": "number",
                "title": {
                    "en": "NDP",
                    "fr": "NPD"
                }
            },
            {
                "name": "bq",
                "type": "number",
                "title": {
                    "en": "BQ",
                    "fr": "BQ"
                }
            },
            {
                "name": "reform",
                "type": "number",
                "title": {
                    "en": "Reform",
                    "fr": "Réform"
                }
            }
        ]
    },
    "data": [
        ["1993/09/9",35,37,8,8,10],
        ["1993/09/14",36,33,8,10,11],
        ["1993/09/20",35,35,6,11,11],
        ["1993/09/25",30,37,8,10,13],
        ["1993/09/26",31,36,7,11,13],
        ["1993/09/26",28,34,7,12,15],
        ["1993/09/30",25,39,6,12,17],
        ["1993/10/02",26,38,8,12,14],
        ["1993/10/08",22,37,8,12,18],
        ["1993/10/16",22,40,7,13,16],
        ["1993/10/19",21,39,6,14,17],
        ["1993/10/22",18,43,7,14,18],
        ["1993/10/22",16,44,7,12,19],
        ["1993/10/25",16.04,41.24,6.88,13.52,18.69]
    ]
}

Format JSON summary

Based on the parameters for Module:Graph

  • version - number: 1 for now, can be incremented in case of back-incompatible version changes in future
  • width - number: canvas size in CSS pixels
  • height - number: canvas size in CSS pixels
  • type - string: "line", "area", "bar"/"rect", "pie", "stackedline", "stackedarea", "stackedrect"
  • interpolate - string: "monotone", "basis" etc (?)
  • colors - array<String>: list of hex codes ("#123456") per data column
  • x - object: X axis config
    • title - string/loc (else empty?)
    • min - number (else auto)
    • max - number (else auto)
    • format - string: format strings (is this safe to expose or do it differently?)
    • angle - number: degrees to rotate the X axis off normal (else 0)
    • type - string: data type ("integer", "number", "date", "string")
    • grid - boolean: whether to show grid lines on this axis,
  • y - object: Y axis config
    • same as xAxis but no angle
  • legend - string/loc: title for the legend box
  • linewidth - number: css px
  • showValues - object
    • format - string: format string (is this safe to expose or do something different?)
    • fontcolor - string
    • fontsize - number
    • offset - number
    • angle - number (pie charts)
  • showSymbols - boolean: whether to show symbol markers on the data points
  • innerRadius - number (pie charts)
  • source - string: pointer to _Data:.tab_ (or other in future) source data page

Older task notes below:

What the legacy Graph extension does

The legacy Graph extension uses a parser tag that contains JSON, like this:

<graph title="example graph">
{ "version": 2, "width": 950, "height": 400,  ..... }
</graph>

The chart definition is always inlined in the article. Reuse of graph definitions is achieved by having a template or Lua module generate the <graph> tag and the JSON inside it.

The chart data can be inlined in the graph definition, or the graph definition can refer to a page in the Data: namespace on Commons (or to multiple Data: pages, or to certain other data sources like the pageviews API or Wikidata SPARQL queries).

Option 1: Inline chart definitions

In this option, chart definitions are inlined in the article. Data is not inlined, but always lives on a Data: page.

Most likely to use a parser function with parameter-passing style, rather than JSON format, to be comfortable for template-wielding editors and Lua module writers wanting to build meta-libraries around the parser function.

The building blocks we have to work with are:

  • A parser function ({{#chart: ... }})
    • The main argument ({{#chart:foo}})
    • Unnamed parameters ({{#chart:foo|value1|value2|...}}_
    • Named parameters ({{#chart:foo|param1=value1|param2=value2|...}})

Named parameters similar to what Module:Graph takes are likely to work well with many existing Graphs uses, and not complex to implement: however this means folks will likely build things using templates rather than Data: pages for char data & definition sharing.

To reuse charts across articles, you'd either have to generate them with templates/Lua, or make the definitions so simple that they don't need to be templated (for example, if all it takes to render a temperature graph is {{#chart:data=Monthly temperature in San Francisco.tab|type=temperature|period=1991-2020}} then maybe that doesn't need a template).

Option 2: Chart definition on its own page

In this option, each chart definition lives on its own page in a new Chart: namespace. These Chart: pages would use a JSON content model, and could have custom preview and editing functionality to make the JSON easier to work with. These chart definitions would not include or refer to data, to allow them to be reused with different data sources on different articles. This would allow us to do things like create one Chart:Temperature that is used for all temperature/climate graphs on articles about cities: the chart would look the same on all of these articles, but the data would be different for each one.

Embedding a chart in an article would be done by referring to the Chart: page and the Data: page, perhaps like this: {{#chart:Temperature|data=Monthly temperature in San Francisco.tab}}. This would pull the chart definition from [[Chart:Temperature]] and fetch the data from [[Data:Monthly temperature in San Francisco.tab]].

This approach would rely on templates much less: reuse of charts would be accomplished by putting chart definitions on their own pages separate from the data, and chart definitions could not themselves contain templates (but maybe they could take parameters).

It is likely that existing data sets will need to be cut down for individual graphs, so allowing column selection and a range limit would be good options with shared data sets.

This would limit parameters to simply selecting the definition, data set, and subranges to render into.

Note that Data:....tab pages can have localized field names, which we can use in the chart rendering. We would want the same for any overridden labels in the chart definition.

Note the chart definitions could live in the Data: namespace as well instead of a separate Chart: namespace, in which case it'd be Data:Foobar.chart or such.

Recommend writing up a few sample charts on each option and looking at what the difficulties look like.

Acceptance criteria
  • Decide on requirement for using separate Data: and/or Chart: definitions and settle on syntax split
  • Write a specification explaining what the syntax is
  • Write a basic ADR for this decision

Event Timeline

Very important issue to consider is (1) we need to be able to use template and template parameters inside chart syntax (e.g. {{#chart:foo|param1={{foo}}|param2={{{1}}}}}; (2) Lua module should be able to generate chart tag with variable number of parameters.

Another thing is easy integrate with WDQS result - embedding it server side needs T67626 which is too far to consider; rendering it client side is much easier.

Catrope renamed this task from Define wikitext syntax for embedding a chart to Define syntax for defining and embedding a chart.Jun 28 2024, 8:19 PM
Catrope updated the task description. (Show Details)

I've expanded the task description to include another option for having chart definitions as their own pages. This would limit the need (but also the ability) to use templates in chart syntax.

LGoto triaged this task as High priority.Jul 1 2024, 5:47 PM
LGoto set the point value for this task to 5.
LGoto moved this task from Backlog to Up Next on the Charts board.

Going to get some links from Chris on past Graphs usage that'll help me in this research :D

A suggestion for supporting both server side rendering, and future interactivity at client side:

Define graphs as webcomponent like

<graph src="path to wiki page with json"></graph>

MW can consider this as parser function and add a server side rendered image in it. So it becomes the following html.

<graph src="path to wiki page with json">... server side rendered image goes here.. </graph>

This is valid html. Browsers just treat these unknown tags as div tags. The image will be rendered.

At client side, define a custom element - webcomponent - that enhances this feature to add interactivity. Replace the SVG with say, <canvas> or whaver a chart library does.

This means, our graphs can be reused anywhere in web, by including the webcomponent definition and adding this html snippet <graph src="path to wiki page with json"></graph> without MW dependency.

I would like to see charts from wikipedia easily reusable in webpages.

An example implementation: https://codepen.io/santhoshtr/pen/wvLBoLr
Screenshot based on COVID-19 cases in Santa Clara County, California:
{F56326792 size=full}

Some quick notes catching up:

We still want to decide whether we want to send all params and data in with the JSON data pages (fixed) or as wiki markup parameters on a parser function (meaning people can use templates and lua modules to expand or process data).

If we push everything to the json blob we might have an invocation like this:

{{#chart:Foobar.json}}

-> Foobar.json:

{
    graph: "line",
    width: 600,
    height: 400,
    x: {
        label: {
            // Should we use the MW namespace for localization instead?
            "en": "Day",
            "fr": "Jour",
        },
        // Should data be queriable from some source?
        data: [1, 2, 3, 4]
    },
    y: {
        label: {
            "en": "Degrees C",
            "fr: "..."
        },
        data: [0, 10, 20, 30],
    }
}

The advantage of the separate page is the same definition and data set can be reused very easily. The downside is you have to set up more stuff outside your page to set up an invocation. Potential advantage is we can have a nice visual editor for the chart. Disadvantage is that's out of scope and we won't get to it on this project.

Questions:

  • should labels be localizable in a shared definition? or let them be customized as copies
  • should data sets exist separately from definitions?
  • should definitions exist as standalone json data pages or be invocation parameters?

In the latter case you might see:

{{#chart:graph=line
|width=600
|height=400
|xAxisLabel=Day
|x=1, 2, 3, 4
|yAxisLabel=Degrees C
|y=0, 10, 20, 30
}}

In the inline case, localizations would not be dealt with directly -- you'd localize it when you copy the template or inline invocation. And you could potentially use wikidata queryies for the x & y data series and labels.

However note that there's not likely a good cache invalidation story on such things, I don't know if any of that's ideal right now.

Secondarily, if we need to be able to query, do we need to be able to subset/filter? The filter options in vega are javascript and thus dangerous, so we want to be very careful and explicit about any filter language we define. It's simplest to avoid this and leave it to the existing wikidata query modules or whatever?

Secondarily, if we need to be able to query, do we need to be able to subset/filter? The filter options in vega are javascript and thus dangerous, so we want to be very careful and explicit about any filter language we define. It's simplest to avoid this and leave it to the existing wikidata query modules or whatever?

I think the minimum we would need is the ability to select only some columns from a larger data set. For example, for a table with ballot measure election results, you'd want to be able to pull out just "County" and "For %" to make a simple bar chart.

I think the minimum we would need is the ability to select only some columns from a larger data set. For example, for a table with ballot measure election results, you'd want to be able to pull out just "County" and "For %" to make a simple bar chart.

*nod* selecting columns by index or name should be pretty straightforward from .tab data pages, which provide a field name for each column. Tabular pages also have a facility for localizing column labels, which sounds useful and avoids having to duplicate it in the invocation.

A range limit might be useful too, but I don't want to add too much.

That also leaves us open to the possibility of generating dynamically updatable query data: pages in the future, that act like the JSON tabular pages but are filled with data from a sparql query to wikidata...

So I think it'd be very useful to support .tab Data pages now, and future stuff later. Do we _also_ need to accept raw data in a data definition page (Data:Foobar.chart ?) / raw parserfunction invocation? That's one we can't walk back, even if we provide better tools alongside. ;)

Catrope changed the point value for this task from 5 to 3.Jul 15 2024, 5:46 PM
Catrope moved this task from Sprint 1 to Sprint 2 on the Charts board.
Catrope edited projects, added Charts (Sprint 2); removed Charts (Sprint 1).

Note the chart definitions could live in the Data: namespace

One of common types of graphs is annual passagers of an airport. Data of such graphs is filled from Wikidata.

Another type is list of historical page views of a page. Data comes from API

There are a number of graphs whose data come from WDQS.

Only allowing pages in Data namespace in Commons may not be enough. This is some use cases it can not solve:

  • Some local wikis may want to store their project-internal data, such as number of open block appeals per day. They may hope they can use a page in local wiki (not Commons), convert it to Tabular Data data model, then it can be edited and used like Commons Data page (cf T252711).
  • A third party wiki installed Chart. They may want to store data in their wiki without using a dedicated namespace.
  • (maybe out of scope) Again a non-WMF wiki installed Chart, but they want to use data from Wikimedia Commons. So Chart should support fetching data via API instead of database access.

Some local wikis may want to store their project-internal data, such as number of open block appeals per day. They may hope they can use a page in local wiki (not Commons), convert it to Tabular Data data model, then it can be edited and used like Commons Data page (cf T252711).

Note that there's nothing preventing using Commons to store this data.

A third party wiki installed Chart. They may want to store data in their wiki without using a dedicated namespace.

I don't think there's any particular reason to need to support it without a dedicated Data: namespace. If they just dislike the aesthetics, then it's on them to figure out how to make the necessary changes to support per-page content types and maintain that code.

(maybe out of scope) Again a non-WMF wiki installed Chart, but they want to use data from Wikimedia Commons. So Chart should support fetching data via API instead of database access.

This isn't at all inconsistent with us using the Data: namespace on commons, as shown by the existence of InstantCommons for media files. Out of scope for now, but 100% consistent with our plans already.

bvibber updated the task description. (Show Details)

Updated task description with results of planning & discussion from last week, resolving as complete for now.