Page MenuHomePhabricator

COUNT with GROUP BY timing out if used as sub query
Open, MediumPublic

Description

SELECT ?pred ?bar WHERE {
  {
    SELECT ?pred (COUNT(?value) AS ?bar) WHERE
    {
      ?subj ?pred ?value .
    } GROUP BY ?pred ORDER BY DESC(?bar) LIMIT 1000
  }
}

times out, while

SELECT ?pred (COUNT(?value) AS ?bar) WHERE
{
  ?subj ?pred ?value .
} GROUP BY ?pred ORDER BY DESC(?bar) LIMIT 1000

doesn't.

Disabling the optimizer doesn't help here.

Potentially related https://jira.blazegraph.com/browse/BLZG-1252, although not flattening the sub query is what's desired here.

It works if re-written using named sub-queries:

SELECT ?pred ?bar WITH {
    SELECT ?pred (COUNT(?value) AS ?bar) WHERE
    {
      ?subj ?pred ?value .
    } GROUP BY ?pred ORDER BY DESC(?bar) LIMIT 1000
  } AS %inner
  WHERE {
    INCLUDE %inner
} ORDER BY DESC(?bar)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think the problem might be the same as in T152606 : things get slow if ?pred isn't a specific property.

I think the problem might be the same as in T152606 : things get slow if ?pred isn't a specific property.

I don't see how they would be related, this query works, unless you make it a subquery.

It's actually surprising that it does work. Maybe when it does, it triggers some VW mode to do the calculation.

Interesting find with the named subquery version.

Smalyshev triaged this task as Medium priority.Feb 15 2017, 1:54 AM