Page MenuHomePhabricator
Paste P7335

SPARQL crash on large query
ActivePublic

Authored by Multichill on Jul 8 2018, 9:22 PM.
Python code I run:
import pywikibot.data.sparql
query = u"""SELECT ?item ?image ?creator ?institution ?invnum ?location ?url ?idurl WHERE {
?item wdt:P31 wd:Q3305213 . # /wdt:P279* wd:Q3305213 .
OPTIONAL { ?item wdt:P18 ?image } .
OPTIONAL { ?item wdt:P170 ?creator } .
OPTIONAL { ?item wdt:P195 ?institution } .
OPTIONAL { ?item wdt:P217 ?invnum } .
OPTIONAL { ?item wdt:P276 ?location } .
OPTIONAL { ?item wdt:P973 ?url } .
OPTIONAL { ?item ?identifierproperty ?identifier .
?property wikibase:directClaim ?identifierproperty .
?property wikibase:propertyType wikibase:ExternalId .
?property wdt:P1630 ?formatterurl .
BIND(IRI(REPLACE(?identifier, '^(.+)$', ?formatterurl)) AS ?idurl).
}
}
LIMIT 1000000"""
sq = pywikibot.data.sparql.SparqlQuery()
queryresult = sq.select(query)
Crashes:
~/pywikibot$ python
Python 2.7.6 (default, Nov 23 2017, 15:49:48)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pywikibot.data.sparql
>>>
>>> query = u"""SELECT ?item ?image ?creator ?institution ?invnum ?location ?url ?idurl WHERE {
... ?item wdt:P31 wd:Q3305213 . # /wdt:P279* wd:Q3305213 .
... OPTIONAL { ?item wdt:P18 ?image } .
... OPTIONAL { ?item wdt:P170 ?creator } .
... OPTIONAL { ?item wdt:P195 ?institution } .
... OPTIONAL { ?item wdt:P217 ?invnum } .
... OPTIONAL { ?item wdt:P276 ?location } .
... OPTIONAL { ?item wdt:P973 ?url } .
... OPTIONAL { ?item ?identifierproperty ?identifier .
... ?property wikibase:directClaim ?identifierproperty .
... ?property wikibase:propertyType wikibase:ExternalId .
... ?property wdt:P1630 ?formatterurl .
... BIND(IRI(REPLACE(?identifier, '^(.+)$', ?formatterurl)) AS ?idurl).
... }
... }"""
>>> sq = pywikibot.data.sparql.SparqlQuery()
>>> queryresult = sq.select(query)
>>> print len (queryresult)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'NoneType' has no len()
If i dump the raw output, it ends with:
},
"location" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q1132918"
},
"url" : {
"type" : "uri",
"value" : "http://samling.nasjonalmuseet.no/en/object/NG.M.03374"
},
"image" : {
"type" : "uri",
"value" : "http://commons.wikimedia.org/wiki/Special:FilePath/Bernt%20Lund%20Fra%20Ulvik%20i%20Hardanger.jpg"
}
}, {
"item" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q55417839"
},
"institution" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q1132918"
},
"invnum" : {
"type" : "literal",
"value" : "NG.M.00892"
},
"location" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q1132918"
},
"url" : {
"type" : "uri",
"value" : "http://samling.nasjonalmuseet.noSPARQL-QUERY: queryStr=SELECT ?item ?image ?creator ?institution ?invnum ?location ?url ?idurl WHERE {
?item wdt:P31 wd:Q3305213 . # /wdt:P279* wd:Q3305213 .
OPTIONAL { ?item wdt:P18 ?image } .
OPTIONAL { ?item wdt:P170 ?creator } .
OPTIONAL { ?item wdt:P195 ?institution } .
OPTIONAL { ?item wdt:P217 ?invnum } .
OPTIONAL { ?item wdt:P276 ?location } .
OPTIONAL { ?item wdt:P973 ?url } .
OPTIONAL { ?item ?identifierproperty ?identifier .
?property wikibase:directClaim ?identifierproperty .
?property wikibase:propertyType wikibase:ExternalId .
?property wdt:P1630 ?formatterurl .
BIND(IRI(REPLACE(?identifier, '^(.+)$', ?formatterurl)) AS ?idurl).
}
}
LIMIT 1000000
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:293)
at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:654)
at com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:288)
at com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240)
at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:271)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)
at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:318)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlQueryTask.call(QueryServlet.java:865)
at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlQueryTask.call(QueryServlet.java:671)
at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.rdf.sail.Bigdata2Sesame2BindingSetIterator.hasNext(Bigdata2Sesame2BindingSetIterator.java:188)
at info.aduna.iteration.IterationWrapper.hasNext(IterationWrapper.java:68)
at org.openrdf.query.QueryResults.report(QueryResults.java:155)
at org.openrdf.repository.sail.SailTupleQuery.evaluate(SailTupleQuery.java:76)
at com.bigdata.rdf.sail.webapp.BigdataRDFContext$TupleQueryTask.doQuery(BigdataRDFContext.java:1713)
at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1569)
at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1534)
at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:747)
... 4 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.checkFuture(BlockingBuffer.java:1523)
at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator._hasNext(BlockingBuffer.java:1710)
at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.hasNext(BlockingBuffer.java:1563)
at com.bigdata.striterator.AbstractChunkedResolverator._hasNext(AbstractChunkedResolverator.java:365)
at com.bigdata.striterator.AbstractChunkedResolverator.hasNext(AbstractChunkedResolverator.java:341)
at com.bigdata.rdf.sail.Bigdata2Sesame2BindingSetIterator.hasNext(Bigdata2Sesame2BindingSetIterator.java:134)
... 11 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.checkFuture(BlockingBuffer.java:1454)
... 16 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.rdf.sail.RunningQueryCloseableIterator.checkFuture(RunningQueryCloseableIterator.java:59)
at com.bigdata.rdf.sail.RunningQueryCloseableIterator.close(RunningQueryCloseableIterator.java:73)
at com.bigdata.striterator.ChunkedWrappedIterator.close(ChunkedWrappedIterator.java:180)
at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:297)
at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:197)
... 4 more
Caused by: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.util.concurrent.Haltable.get(Haltable.java:273)
at com.bigdata.bop.engine.AbstractRunningQuery.get(AbstractRunningQuery.java:1516)
at com.bigdata.bop.engine.AbstractRunningQuery.get(AbstractRunningQuery.java:104)
at com.bigdata.rdf.sail.RunningQueryCloseableIterator.checkFuture(RunningQueryCloseableIterator.java:46)
... 8 more
Caused by: java.lang.Exception: task=ChunkTask{query=602d4661-4ee4-4492-a40b-572e24fec3be,bopId=24,partitionId=-1,sinkId=null,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1367)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:926)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:821)
... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1347)
... 8 more
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Illegal group reference
at com.bigdata.rdf.internal.constraints.TryBeforeMaterializationConstraint.accept(TryBeforeMaterializationConstraint.java:124)
at com.bigdata.bop.bset.ConditionalRoutingOp$ConditionalRouteTask.call(ConditionalRoutingOp.java:199)
at com.bigdata.bop.bset.ConditionalRoutingOp$ConditionalRouteTask.call(ConditionalRoutingOp.java:135)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1346)
... 8 more
Caused by: java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at com.bigdata.rdf.internal.constraints.ReplaceBOp.evaluate(ReplaceBOp.java:230)
at com.bigdata.rdf.internal.constraints.ReplaceBOp.get(ReplaceBOp.java:175)
at com.bigdata.rdf.internal.constraints.ReplaceBOp.get(ReplaceBOp.java:51)
at com.bigdata.rdf.internal.constraints.IVValueExpression.getAndCheckBound(IVValueExpression.java:509)
at com.bigdata.rdf.internal.constraints.IriBOp.get(IriBOp.java:86)
at com.bigdata.rdf.internal.constraints.IriBOp.get(IriBOp.java:51)
at com.bigdata.rdf.internal.constraints.ConditionalBind.get(ConditionalBind.java:133)
at com.bigdata.rdf.internal.constraints.ProjectedConstraint.accept(ProjectedConstraint.java:77)
at com.bigdata.rdf.internal.constraints.TryBeforeMaterializationConstraint.accept(TryBeforeMaterializationConstraint.java:103)
... 12 more

Event Timeline

My guess would be that some formatter URL contained $2, but I can’t find any such statement now… perhaps it was fixed already?

Regardless, the following version would be more robust:

BIND(IRI(REPLACE(?formatterurl, "\\$1", ?identifier)) AS ?idurl)