Page MenuHomePhabricator

ConcurrentModificationException on non-grouping query with aggregates in SELECT
Closed, ResolvedPublic

Description

The following query (link) results in an exception:

SELECT (COUNT(*) AS ?a) (COUNT(?x) AS ?b) (?b/?a AS ?r) {}

java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
at com.bigdata.rdf.sparql.ast.StaticAnalysis.gatherVarsToMaterialize(StaticAnalysis.java:2058)
at com.bigdata.rdf.sparql.ast.StaticAnalysis.gatherVarsToMaterialize(StaticAnalysis.java:2009)
at com.bigdata.rdf.sparql.ast.eval.AST2BOpUtility.addAggregation(AST2BOpUtility.java:4679)
at com.bigdata.rdf.sparql.ast.AST2BOpUtility.convertQueryBaseWithScopedVars(AST2BOpUtility.java:534)
at com.bigdata.rdf.sparql.ast.eval.AST2BOpUtility.convert(AST2BOpUtility.java:287)
at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.optimizeQuery(ASTEvalHelper.java:426)

I think the query is legal, according to SPARQL, because GROUP BY is not a requirement for using aggregates (“By default a solution set consists of a single group, containing all solutions.”). If you remove the last projection (?b/?a AS ?r), it works.

Here’s the non-reduced query where I found the error: ratio of mandatory constraints

SELECT (COUNT(*) AS ?total) (COUNT(?status) AS ?mandatory) (?mandatory/?total AS ?ratio) WHERE {
  ?property p:P2302 ?statement.
  OPTIONAL { ?statement pq:P2316 ?status. }
}

Event Timeline

Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald Transcript

Workaround:

SELECT ?total ?mandatory (?mandatory/?total AS ?ratio) WHERE {
{
SELECT (COUNT(*) AS ?total) (COUNT(?status) AS ?mandatory)  WHERE {
  ?property p:P2302 ?statement.
  OPTIONAL { ?statement pq:P2316 ?status. }
}
}
}
Smalyshev triaged this task as Medium priority.Aug 7 2017, 8:07 PM

If it's the same as on this, I think it broke on February 22.

Note that this can also occur on grouping queries:

SELECT ?x (COUNT(*) AS ?total) (SUM(?y) AS ?ys) (?ys/?total AS ?ratio) WHERE {
  ?x wdt:P31 wd:Q1.
  BIND(1 AS ?y)
}
GROUP BY ?x

And an alternative workaround is to inline the variables into the SELECT, i. e. (SUM(?y)/COUNT(*) AS ?ratio). The advantage of this is that you can still use ?ratio in a HAVING clause on the grouping query (a FILTER in the outer query is less efficient).

Change 533108 had a related patch set uploaded (by Igor Kim; owner: Igor Kim):
[wikidata/query/blazegraph@master] Fix Concurrent modification on non-grouping query with aggregates

https://gerrit.wikimedia.org/r/533108

Change 533108 merged by Smalyshev:
[wikidata/query/blazegraph@master] Fix Concurrent modification on non-grouping query with aggregates

https://gerrit.wikimedia.org/r/533108

debt claimed this task.