Page MenuHomePhabricator

Add support scatter plots
Open, LowPublicFeature

Description

Feature summary (what you would like to be able to do and where):

Extension Chart supports line charts. Per https://www.mediawiki.org/wiki/Extension:Chart#Line

Related to line charts, are scatter charts/ scatter plots. These are explained at

https://en.wikipedia.org/wiki/Scatter_plot

Note that these often have trend lines added, which would be like the line chart, but not quite the same.

Benefits

This is a major type of chart.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add support for scatter plotsrepos/mediawiki/services/chart-renderer!78simon04T392110main
Customize query in GitLab

Event Timeline

CCiufo-WMF renamed this task from Extension:Chart should support scatter plots to Add support scatter plots.May 1 2025, 3:03 AM

I'm not sure that this is "low importance" -- it seems a little more important than that.

@Bugreporter2: Nobody ever said it's "low importance". It's currently low priority though.

Cross-posting my comment from T392106#10783292

Yes, to clarify, this is currently not a priority for the team working on Charts. The low priority does not mean it is low value.

This chart type is a really great reason to add in the functionality for symbols/patterns for accessibility and readability. I would advise completing T376198: Apply pattern functionality in order to complete this task.

Change #1141124 had a related patch set uploaded (by Simon04; author: Simon04):

[mediawiki/extensions/Chart@master] Add support for scatter plots

https://gerrit.wikimedia.org/r/1141124

aude subscribed.

With this dataset, the scatterplot works with the implementation:

https://commons.wikimedia.beta.wmflabs.org/wiki/Data:Arrhenius.tab

Screenshot 2025-05-23 at 6.18.38 PM.png (1×2 px, 149 KB)

The initial implementation is best for scientific and other data where the points do not need a label. (future enhancement, out of scope here could be to support showing a label for cases where the data pertains to a country or something like that)

Also found a possible issue:

If the data has null values on the x-axis, in which case the data ends up being plotted as category data which doesn't look good. Do we want the extension to filter out these values or handle or leave this to lua transforms?

I left some comments in the merge request for the service:

https://gitlab.wikimedia.org/repos/mediawiki/services/chart-renderer/-/merge_requests/78

if the tabular data is more clean, with only 2 number columns. then the implementation works okay but maybe the service and/or extension could have more validation and handle unexpected data for the scatterplot.

@aude -- the example that you've used is not great -- it would be better as a line chart, due to the nature of the data (I think it's one chemical reaction that has a series of measurements over time). You might want to use a better example in any testing or documentation. Something like body size v brain size would work:

https://commons.wikimedia.org/wiki/File:Brain-body_mass_ratio_for_some_animals_diagram.svg

I left some comments in the merge request for the service:

https://gitlab.wikimedia.org/repos/mediawiki/services/chart-renderer/-/merge_requests/78

for the scatterplot, should yAxisColumns only have one column? and validate that it is a number column?

if the data is unexpected (e.g. y axis has category data or there are extra columns) then the chart could appear in unexpected ways. Maybe it's up to the user though or use lua to transform the data to the right way.

also if the xAxis has null values then the scatterplot is treated as category data since I think isValidNumberColumn does not consider or filter null values.

if the tabular data is more clean, with only 2 number columns. then the implementation works okay but maybe the service and/or extension could have more validation and handle unexpected data for the scatterplot.

@aude, I'm not sure if I understand your concerns correctly. I mean technically scatter plots are like line plots just without the line, right? Thus, any validation requirements would equally apply to line plots. I think a deeper validation of input data is out of scope for this ticket and needs to be specified more precisely. (And on a personal note, I'm not interested in neither specifying nor implementing this validation. If it is a requirement for landing scatter plots, I'll abandon my patches.)

Just wanting to note publicly that I think we should consider this chart type as part of the initial offering, but I don't think we should add any other chart types after that.

I think it's important as part of resolving this ticket that we document on https://www.mediawiki.org/wiki/Extension:Chart what kind of chart types we would consider adding in future if volunteers provide patches. My concern is that we want to enter a period of stability and certain chart types will come with their own unique challenges (e.g. optimizing for mobile, breaking changes to the chart specification for example) with limited resources to dedicate time to thinking and addressing those problems.

simon04 subscribed.

Just for reference, here are the most basic scatter plots that are offered by Apache echarts:
https://echarts.apache.org/handbook/en/how-to/chart-types/scatter/basic-scatter/
It allows varied dot sizes and colors.

And some more advanced scatter charts and bubble charts:
https://echarts.apache.org/examples/en/index.html#chart-type-scatter
Here are 3D scatter charts:
https://echarts.apache.org/examples/en/index.html#chart-type-scatter3D

What I think would be useful in many Wikipedia articles, showing country statistics over time, is the kind of animated bubble charts that Hans Rosling used to show:
https://www.gapminder.org/tools/#$chart-type=bubbles&url=v2

Change #1141124 abandoned by Simon04:

[mediawiki/extensions/Chart@master] Add support for scatter plots

https://gerrit.wikimedia.org/r/1141124