Page MenuHomePhabricator

Using label service twice in one query results in obscure error message
Closed, ResolvedPublic

Description

Query:

SELECT ?labelEn ?labelDe WHERE {
  BIND(wd:Q1 AS ?item)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?labelEn.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "de".
    ?item rdfs:label ?labelDe.
  }
}

java.lang.RuntimeException: there can be only one "run last" join in any group

I happen to know why the error occurs because I saw a related patch: the label service optimizer adds a hint:Prior hint:runLast true. hint to the label service unless there’s another explicit hint (which also means that there’s a simple workaround if you know about it: add hint:Prior hint:runLast false. to the second label service). But for anyone else who isn’t that familiar with the internal implementation of the label service, this message is extremely confusing, because there’s no hint:runLast to be seen in the query (even if you can make the connection from “"run last" join” to hint:runLast).

It should be possible to get labels and descriptions for multiple languages without an explicit query hint.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev triaged this task as Medium priority.Sep 25 2017, 12:01 AM
Smalyshev moved this task from Incoming to Current work on the Wikidata-Query-Service board.

The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps:

  • first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process.
  • then EmptyLabelServiceOptimizer adds statement patterns for resolutions.
  • and then additional optimizer step rearranges LabelService to the latest possible step before any clauses, which might use the variables bound by LabelService.

All tests in LabelServiceUnitTest (including new specific testcase from this bug) are passing, but I think it might take some additional tuning to properly support all 'real-life' usage scenarios. For example FILTER clauses, including those which are written above service calls and binds. These might also need additional rearrangement.
I have not applied them yet, as this might become a waterfall, which will rearrange the clauses to much.

@Igorkim78 looks like the patch is buggy. This query:

SELECT DISTINCT ?auth ?authItemLabel ?desc ?linkPattern ?countryLabel ?remoteID WHERE 
{
   ?authItem wdt:P31 ?authTypes; schema:description ?desc; wdt:P1630 ?linkPattern; wikibase:directClaim ?auth . 
  OPTIONAL { ?authItem wdt:P17 ?country. }
  wd:Q7561898 ?auth ?remoteID . 
  FILTER((LANG(?desc)) = "en") 
  FILTER(?authTypes IN (wd:Q19595382, wd:Q21745557, wd:Q55653847)) 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} 
ORDER BY ?countryLabel

produces empty countryLabel for new code but some non-empty ones for old code.

If OPTIONAL is removed (i.e. its patten in made non-optional) then it works fine. So looks like something related to handling of optionals.

Another broken one:

SELECT ?compound ?compoundLabel ?prop ?id ?idLabel WHERE 
{ 
  VALUES ?compound {wd:Q27295794} 
  VALUES ?prop { wdt:P231 wdt:P232 wdt:P267 wdt:P486 wdt:P592 wdt:P595 wdt:P652 wdt:P661 wdt:P662 wdt:P665 wdt:P683 wdt:P715 wdt:P2115 wdt:P2892 wdt:P3345 wdt:P3636 wdt:P2017 wdt:P274 wdt:P233 wdt:P234 wdt:P235 wdt:P129 wdt:P2175 wdt:P2868 wdt:P3489 wdt:P3780 wdt:P2275 skos:altLabel} 
  OPTIONAL {?compound ?prop ?id filter (isIRI(?id) || (lang(?id) = "en" || lang(?id) = "")) .} 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}

Fixed optional support and added testcase for that code path.
Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could be reordered prior to any clauses placed at the bottom of the query, we need to consider only inbound variables, so they would be available for the service call, and all outbound vars available for the latter filters and other clauses.

Another broken one:

SELECT ?item ?itemLabel WHERE { 
  { 
    SELECT (MIN(?item) AS ?item) WHERE {?item wdt:P373 "A Topographical and Historical Description of London and Middlesex (1820) by Bayley, Brewer and Nightingale" 
    FILTER NOT EXISTS {?item wdt:P31 wd:Q4167836} 
  } 
  } 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } 
}

itemLabel is empty in the new code. Looking at explain, this is rewritten as:

WITH {
  QueryType: SELECT
  SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(item))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#min, valueExpr=com.bigdata.bop.rdf.aggregate.MIN(item)] AS VarNode(item) )
    JoinGroupNode {
      StatementPatternNode(VarNode(item), ConstantNode(Vocab(6)[http://www.wikidata.org/prop/direct/P]:XSDUnsignedShort(373)), ConstantNode(TermId(1466429127L)[A Topographical and Historical Description of London and Middlesex (1820) by Bayley, Brewer and Nightingale])) [scope=DEFAULT_CONTEXTS]
        AST2BOpBase.estimatedCardinality=1
        AST2BOpBase.originalIndex=POS
      QueryType: ASK
      SELECT VarNode(item) VarNode(-exists-1)[anonymous]
        JoinGroupNode {
          StatementPatternNode(VarNode(item), ConstantNode(Vocab(6)[http://www.wikidata.org/prop/direct/P]:XSDUnsignedByte(31)), ConstantNode(Vocab(2)[http://www.wikidata.org/entity/Q]:XSDUnsignedInt(4167836))) [scope=DEFAULT_CONTEXTS]
            AST2BOpBase.estimatedCardinality=4309882
            AST2BOpBase.originalIndex=POS
        } AST2BOpBase.estimatedCardinality=4309882
      @askVar=-exists-1
      FILTER( NotExistsNode(VarNode(-exists-1))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.bigdata.com/sparql-1.1-undefined-functionsnot-exists, graphPattern=
        JoinGroupNode {
          StatementPatternNode(VarNode(item), ConstantNode(Vocab(6)[http://www.wikidata.org/prop/direct/P]:XSDUnsignedByte(31)), ConstantNode(Vocab(2)[http://www.wikidata.org/entity/Q]:XSDUnsignedInt(4167836))) [scope=DEFAULT_CONTEXTS]
            AST2BOpBase.estimatedCardinality=4309882
            AST2BOpBase.originalIndex=POS
        } AST2BOpBase.estimatedCardinality=4309882, valueExpr=com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1))] )
    }
} AS -subSelect-1 JOIN ON () DEPENDS ON ()
QueryType: SELECT
includeInferred=true
timeout=600000
SELECT ( VarNode(item) AS VarNode(item) ) ( VarNode(itemLabel) AS VarNode(itemLabel) )
  JoinGroupNode {
    SERVICE <ConstantNode(TermId(0U)[http://wikiba.se/ontology#label])> {
      JoinGroupNode {
        StatementPatternNode(ConstantNode(TermId(0U)[http://www.bigdata.com/rdf#serviceParam]), ConstantNode(TermId(0U)[http://wikiba.se/ontology#language]), ConstantNode(TermId(17380L)[en])) [scope=DEFAULT_CONTEXTS]
        StatementPatternNode(VarNode(item), ConstantNode(Vocab(74)[http://www.w3.org/2000/01/rdf-schema#label]), VarNode(itemLabel)) [scope=DEFAULT_CONTEXTS]
      }
    }
    INCLUDE -subSelect-1 JOIN ON ()
  }

Which is wrong - the service clause should go after the subselect, not before. hint:Query hint:optimizer "None". doesn't change anything - should it? Should we have some way to disable the rewriting behavior in case it fails?

Another broken one:

SELECT ?parent ?parentLabel WHERE { 
  wd:Q174097 wdt:P31/wdt:P279* ?parent . 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } 
}

Again, service comes out in front. Looks like path nodes are mishandled.

Another broken one (likely same reason):

SELECT ?count ?gender ?genderLabel WITH 
{ 
  SELECT (COUNT(DISTINCT ?researcher) AS ?count) ?gender WHERE { ?researcher ( wdt:P108 | wdt:P463 | wdt:P1416 ) / wdt:P361* wd:Q1269766 . ?researcher wdt:P21 ?gender . } GROUP BY ?gender 
} AS %result WHERE { 
  INCLUDE %result 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,ep,fr,jp,nl,no,ru,sv,zh" . } 
} 
ORDER BY DESC(?count)

Anther one:

SELECT ?item ?name ?coord WHERE {
    wd:Q12988 wdt:P625 ?Firstloc .
    wd:Q184287 wdt:P625 ?Secondloc .
    SERVICE wikibase:box
    {
        ?item wdt:P625 ?coord .
        bd:serviceParam wikibase:cornerNorthEast ?Firstloc .
        bd:serviceParam wikibase:cornerSouthWest ?Secondloc .
    }
    SERVICE wikibase:label
    {
        bd:serviceParam wikibase:language "fr" .
        ?item rdfs:label ?name
    }
}
ORDER BY ASC(?name)

This one probably doesn't get the wikibase:box defining ?item.

Change 520647 had a related patch set uploaded (by Igor Kim; owner: Igor Kim):
[wikidata/query/blazegraph@master] Support for TempTripleStore in GeoSpatialServiceFactory

https://gerrit.wikimedia.org/r/520647

Here's an interesting one:

SELECT ?pLabel ?prop ?val ?valLabel WHERE {
    wd:Q8486 ?prop ?val .
    ?ps wikibase:directClaim ?prop .
    ?ps rdfs:label ?pLabel .
    SERVICE wikibase:label
    {
        bd:serviceParam wikibase:language 'en' .
    }
    FILTER ((LANG(?valLabel)) = 'en' && (?prop != wdt:P18))
}

It doesn't really work (empty result) but I am not sure why not. There's query hint that says:

SPARQL semantics is defined bottom up, and in the query we detected a variable that is used in a value expression but known not to be in scope when evaluating the value expression. To fix the problem, you may want to push the construct binding the variable (this might be a triple pattern, BIND, or VALUES clause) inside the scope in which the variable is used. The affected variable is 'valLabel', which has been renamed in the optimized AST to '-unbound-var-valLabel-0' in order to avoid conflicts.

But I am not sure why it's actually undefined. Even if I split the filter:

SELECT ?pLabel ?prop ?val ?valLabel WHERE {
    wd:Q8486 ?prop ?val .
    ?ps wikibase:directClaim ?prop .
    ?ps rdfs:label ?pLabel .
    FILTER (?prop != wdt:P18)
    SERVICE wikibase:label
    {
        bd:serviceParam wikibase:language 'en' .
    }
    FILTER (LANG(?valLabel) != 'fr')
}

it still doesn't work. I admit the last filter is useless but there could be any number of useful checks instead, and it still will be broken. Granted, it wasn't supported before, but if we're already fixing things...

Change 520647 merged by Smalyshev:
[wikidata/query/blazegraph@master] Support for TempTripleStore in GeoSpatialServiceFactory

https://gerrit.wikimedia.org/r/520647

This query also doesn't seem to work:

SELECT ?pLabel ?prop ?val ?valLabel WHERE {  
  wd:Q8486 ?prop ?val .   
  ?ps wikibase:directClaim ?prop .   
  ?ps rdfs:label ?pLabel .   
  SERVICE wikibase:label {     
    bd:serviceParam wikibase:language 'en'.   
  }   
  FILTER ((LANG(?pLabel )) = 'en' && (?prop != wdt:P18))}

but it may be hard to fix that.