Page MenuHomePhabricator

Chart views sometimes combine multiple numeric results into one
Open, Needs TriagePublic

Description

Minimal reproducer:

#defaultView:LineChart
SELECT ?x ?y ?c WHERE {
  VALUES (?x ?y ?c) {
    (1 1 "")
    (2 2 "")
  }
}

This should show a line from (1,1) to (2,2); instead it shows a single dot at (3,3).

A more realistic example, by @Fnielsen:

#defaultView:LineChart
SELECT ?year (count(distinct ?citing_work) as ?count) ?author ?authorLabel  WHERE {
  VALUES ?author { wd:Q6758402 wd:Q20980928 }
  { 
    SELECT ?author (MIN(?work_year) AS ?first_year) WHERE {
      ?work wdt:P50 ?author .
      ?work wdt:P577 ?work_publication_datetime . 
      BIND(YEAR(?work_publication_datetime) AS ?work_year)
    }
    GROUP BY ?author
  }
  ?work wdt:P50 ?author .
  ?citing_work wdt:P2860 ?work .
  ?citing_work wdt:P577 ?date .
  BIND(YEAR(?date) - ?first_year AS ?year) 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} 
GROUP BY ?author ?year ?first_year ?authorLabel
ORDER BY (?year)

This should show more than just two points connected by a straight line.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I just got bitten by this, as described here at WD:RAQ.

I was plotting a scatter-plot for the difference in coordinates between two different sources https://w.wiki/36Kk , and got a rogue point that was completely out of scale compared to everything else.

It turned out that a lot of my items had the label "Manor Farmhouse", and the plot had added the value for all such items together. This is not behaviour that I would ever have expected, nor that I desired. It's a definite trap for the unwary -- and even for the wary, it's quite an unexpected burden to have to ensure that all the labels are different -- in so many queries, multiple values may slip past, introducing subtle undermining errors into the query output.

This is a bug, not a feature, and it ought to be sorted out.

I would also heartily support action on T185476 to allow more flexibility in the way WDQS columns are used as output. It is absurd that only one column can be chosen to describe the points (unlike eg the map display-mode, where multiple columns can be displayed), and it is also really really unhelpful that even that single column cannot be a URL or an item link. This is very poor, and essentially cripples the usefulness of the charts that can be output.

This query looks OK in the "table view" but produces very similar, broken results in the scatterplot view:
https://w.wiki/5YyN

This is likely another manifestation of the bug.