Page MenuHomePhabricator

Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected
Closed, ResolvedPublic

Description

Minimal test case:

# No results unless ?langLabel is also selected
SELECT ?lang #?langLabel
WHERE {
	{
		SELECT ?lang ?langLabel WHERE {
			BIND(wd:Q154755 AS ?lang)
			SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
		}
	}
	FILTER("Ada"@en = ?langLabel) .
}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev triaged this task as Medium priority.Dec 15 2016, 10:10 PM

Maybe also related to T147577. Not merging in yet as it slightly different but good chance fixing that one will fix this one too.

Doesn't look like it's T147577 since that is fixed but this one still happens.

This seems to be optimizers order problem.
CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions:
while running ASTDeferredIVResolution
while running com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer
then while running ConditionalRoutingOp for ChunkedRunningQuery

So, finally, the solution got discarded in
com.bigdata.rdf.internal.constraints.SPARQLConstraint.accept(IBindingSet)
And LabelService has not got called at all.

On the other hand, if langLabel uncommended on the outer projection, LabelService is called
and langLabel is already bound while calling SPARQLConstraint.accept.

The difference in query execution plans is that on successful one, additional statement is added to LabelService clause:

SERVICE <ConstantNode(TermId(0U)[http://wikiba.se/ontology#label])> {

  JoinGroupNode {
    StatementPatternNode(ConstantNode(TermId(0U)[http://www.bigdata.com/rdf#serviceParam]), ConstantNode(TermId(0U)[http://wikiba.se/ontology#language]), ConstantNode(TermId(0L)[en])) [scope=DEFAULT_CONTEXTS]
    StatementPatternNode(VarNode(lang), ConstantNode(Vocab(74)[http://www.w3.org/2000/01/rdf-schema#label]), VarNode(langLabel)) [scope=DEFAULT_CONTEXTS] # <<< Missing statement pattern
  }
}

If it is added manually, the query succedes:

SELECT ?lang #?langLabel
WHERE {

{

		SELECT ?lang ?langLabel WHERE {
			BIND(wd:Q154755 AS ?lang)
			SERVICE wikibase:label {
              bd:serviceParam wikibase:language "en" .
              ?lang rdfs:label ?langLabel .
            }
		}

}
FILTER("Ada"@en = ?langLabel) .

}

OK, I checked and indeed Optimizer uses query root instead of encompassing QueryBase, which is wrong. However, the problem is that there seems to be no way to get from JoinGroup to encompassing QueryBase - parent for QueryBase is null even for subqueries. The code in AbstractJoinGroupOptimizer is:

} else if (child instanceof QueryBase) {

     final QueryBase subquery = (QueryBase) child;

     @SuppressWarnings("unchecked")
     final GraphPatternGroup<IGroupMemberNode> childGroup = (GraphPatternGroup<IGroupMemberNode>) subquery
             .getWhereClause();

     optimize(ctx, sa, bSets, childGroup);

Which passes on the WHERE clause for the subquery but loses the rest of the information. I wonder how to work around it. I could of course re-scan whole tree starting from the root and find our parent node but this looks like very inefficient way of doing it. Ideally there should be a way to get from subordinate clause to parent clause, or at least maybe pass it in optimize() arguments or context to see what subquery we are processing?

Change 508725 had a related patch set uploaded (by Igor Kim; owner: Igor Kim):
[wikidata/query/rdf@master] Propagate Projection from SubqueryNode to where clause

https://gerrit.wikimedia.org/r/508725

The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService clause is unavailable.

parent is defined in GroupMemberNodeBase as IGroupNode<IGroupMemberNode>, so it could not give us a reference to ServiceNode to propagate to SubqueryBase, as both of them are not IGroupNode descendants.

So to get back-references from nested statement patterns, clauses etc. to SubqueryBase which we need to extract it's projection, we would need to introduce proper annotations and propagate them through different types of nesting Nodes as actual service clause might be enclosed with for example UnionNode, etc.

Assignment of annotation can be done in com.bigdata.rdf.sail.sparql.BigdataExprBuilder.handleWhereClause(ASTQuery, QueryBase) as

queryRoot.setWhereClause(ret);
if (queryRoot instanceof SubqueryBase) {
	ret.annotations().put(QueryBase.Annotations.PROJECTION, queryRoot.getProjection());
}

Then we could use it in org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as

if (!foundArg) {
    EmptyLabelServiceOptimizer.this.addResolutions(ctx, g, (ProjectionNode) service.getParent().annotations().get(QueryBase.Annotations.PROJECTION));
}

But this would require changing blazegraph core and also additional handling should be applied to properly propagate annotation to nested clauses.

There is another option though, org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode)
is already traversing the whole tree to process LabelService clauses, so we could hangle projection annotation at this point:

join.getChildren(SubqueryBase.class).stream().forEach(node->{
    SubqueryBase subqueryBase = (SubqueryBase)node;
    JoinGroupNode whereClause = (JoinGroupNode)subqueryBase.getWhereClause();
    whereClause.setProperty(QueryBase.Annotations.PROJECTION, subqueryBase.getProjection());
});

Though here we might also need some additional handling for LabelService inside of nested clauses.

Created changeset https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/508725/
I had to apply the same changes to pom as T213375 to link with bigdata-rdf-test.

I am a bit cautions about adding stuff to annotations because there's some code that scans all the annotations and unexpected annotations may confuse it. That's why btw it is not possible to add reference to query base as annotation to child node (there's parent but it's used for another purpose) but projection may be ok... still I wonder if it won't influence something.

Though of course if we can get it working without patching Blazegraph core it's preferable. I am still not sure how parent works - maybe we should just go to parent group clause until we find one that has projection?

Change 508725 merged by jenkins-bot:
[wikidata/query/rdf@master] Propagate Projection from SubqueryNode to where clause

https://gerrit.wikimedia.org/r/508725