Page MenuHomePhabricator

Blazegraph crash on updater
Closed, ResolvedPublic

Assigned To
None
Authored By
Smalyshev
Apr 28 2015, 7:49 PM
Referenced Files
F158796: dumpJournal2
Apr 30 2015, 2:50 AM
F158672: dumpJournal
Apr 29 2015, 7:27 PM
F158480: dumpJournal2
Apr 29 2015, 6:15 AM
F158348: dumpJournal
Apr 28 2015, 11:40 PM
F158347: dumpJournal
Apr 28 2015, 11:39 PM
F158346: dumpJournal
Apr 28 2015, 11:38 PM
F158322: dumpJournal
Apr 28 2015, 9:48 PM
F158320: dumpJournal
Apr 28 2015, 9:39 PM

Description

Following crash happened when running update:

19:46:43.487 [main] DEBUG o.w.query.rdf.tool.rdf.RdfRepository - Setting last updated time to Tue Apr 28 16:47:33 UTC 2015
Exception in thread "main" org.wikidata.query.rdf.tool.exception.ContainedException: Non-200 response from triple store:  HttpResponseProxy{HTTP/1.1 500 Server Er
ror [Content-Type: text/plain, Transfer-Encoding: chunked, Server: Jetty(9.2.z-SNAPSHOT)] ResponseEntityProxy{[Content-Type: text/plain,Chunked: true]}} body=
SPARQL-UPDATE: updateStr=DELETE {
  <http://www.wikidata.org> <http://schema.org/dateModified> ?o .
}
WHERE {
  <http://www.wikidata.org> <http://schema.org/dateModified> ?o .
};
INSERT DATA {
  <http://www.wikidata.org> <http://schema.org/dateModified> "2015-04-28T16:47:33.000Z"^^<xsd:dateTime> .
}

java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem 
with entry at -711345699177889370: lastRootBlock=rootBlock{ rootBlock=0, challisField=2740925, version=3, nextOffset=3775731623797917, localTime=1430247841782 [Tu
esday, April 28, 2015 7:04:01 PM UTC], firstCommitTime=1428692901050 [Friday, April 10, 2015 7:08:21 PM UTC], lastCommitTime=1430247841781 [Tuesday, April 28, 201
5 7:04:01 PM UTC], commitCounter=2740925, commitRecordAddr={off=NATIVE:-165623078,len=422}, commitRecordIndexAddr={off=NATIVE:-155674208,len=220}, blockSequence=1
, quorumToken=-1, metaBitsAddr=2455328113296235, metaStartAddr=952388, storeType=RW, uuid=f8287971-835c-4bb1-9b83-215d38565a61, offsetBits=42, checksum=-498521427
, createTime=1428691460132 [Friday, April 10, 2015 6:44:20 PM UTC], closeTime=0}
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:261)
        at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:359)
        at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:165)
        at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:237)
        at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:137)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:497)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -711345699177889370: lastRootBlock=rootBlock{ rootBlock=0, challisField=2740925, version=3, nextOffset=3775731623797917, localTime=1430247841782 [Tuesday, April 28, 2015 7:04:01 PM UTC], firstCommitTime=1428692901050 [Friday, April 10, 2015 7:08:21 PM UTC], lastCommitTime=1430247841781 [Tuesday, April 28, 2015 7:04:01 PM UTC], commitCounter=2740925, commitRecordAddr={off=NATIVE:-165623078,len=422}, commitRecordIndexAddr={off=NATIVE:-155674208,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=2455328113296235, metaStartAddr=952388, storeType=RW, uuid=f8287971-835c-4bb1-9b83-215d38565a61, offsetBits=42, checksum=-498521427, createTime=1428691460132 [Friday, April 10, 2015 6:44:20 PM UTC], closeTime=0}
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:460)
        at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask.call(QueryServlet.java:371)
        at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at com.bigdata.rdf.task.AbstractApiTask.submitApiTask(AbstractApiTask.java:365)
        at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:258)
        ... 26 more   
Caused by: org.openrdf.query.UpdateExecutionException: java.lang.RuntimeException: Problem with entry at -711345699177889370: lastRootBlock=rootBlock{ rootBlock=0, challisField=2740925, version=3, nextOffset=3775731623797917, localTime=1430247841782 [Tuesday, April 28, 2015 7:04:01 PM UTC], firstCommitTime=1428692901050 [Friday, April 10, 2015 7:08:21 PM UTC], lastCommitTime=1430247841781 [Tuesday, April 28, 2015 7:04:01 PM UTC], commitCounter=2740925, commitRecordAddr={off=NATIVE:-165623078,len=422}, commitRecordIndexAddr={off=NATIVE:-155674208,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=2455328113296235, metaStartAddr=952388, storeType=RW, uuid=f8287971-835c-4bb1-9b83-215d38565a61, offsetBits=42, checksum=-498521427, createTime=1428691460132 [Friday, April 10, 2015 6:44:20 PM UTC], closeTime=0}
        at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1303)
        at com.bigdata.rdf.sail.BigdataSailUpdate.execute2(BigdataSailUpdate.java:152)
        at com.bigdata.rdf.sail.webapp.BigdataRDFContext$UpdateTask.doQuery(BigdataRDFContext.java:1683)
        at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1310)
        at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1275)
        at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:517)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        ... 1 more
Caused by: java.lang.RuntimeException: Problem with entry at -711345699177889370: lastRootBlock=rootBlock{ rootBlock=0, challisField=2740925, version=3, nextOffset=3775731623797917, localTime=1430247841782 [Tuesday, April 28, 2015 7:04:01 PM UTC], firstCommitTime=1428692901050 [Friday, April 10, 2015 7:08:21 PM UTC], lastCommitTime=1430247841781 [Tuesday, April 28, 2015 7:04:01 PM UTC], commitCounter=2740925, commitRecordAddr={off=NATIVE:-165623078,len=422}, commitRecordIndexAddr={off=NATIVE:-155674208,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=2455328113296235, metaStartAddr=952388, storeType=RW, uuid=f8287971-835c-4bb1-9b83-215d38565a61, offsetBits=42, checksum=-498521427, createTime=1428691460132 [Friday, April 10, 2015 6:44:20 PM UTC], closeTime=0}
        at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3113)
        at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:99)
        at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:3370)
        at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:272)
        at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertCommit(AST2BOpUpdate.java:353)
        at com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdate(AST2BOpUpdate.java:307)
        at com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.executeUpdate(ASTEvalHelper.java:1295)
        ... 9 more
Caused by: java.lang.RuntimeException: Problem with entry at -711345699177889370
        at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4967)
        at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3539)
        at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:781)
        at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3476)
        at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3278)
        at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4088)
        at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3111)
        ... 15 more   
Caused by: java.lang.RuntimeException: addr=-157604858 : cause=java.lang.IllegalStateException: Bad Address: length requested greater than allocated slot
        at com.bigdata.rwstore.RWStore.getData(RWStore.java:2190)
        at com.bigdata.rwstore.RWStore.getData(RWStore.java:1989)
        at com.bigdata.rwstore.RWStore.getData(RWStore.java:2033)
        at com.bigdata.rwstore.RWStore.getData(RWStore.java:1989)
        at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4857)
        at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4953)
        ... 21 more   
Caused by: java.lang.IllegalStateException: Bad Address: length requested greater than allocated slot
        at com.bigdata.rwstore.RWStore.getData(RWStore.java:2082)
        ... 26 more   

        at org.wikidata.query.rdf.tool.rdf.RdfRepository.execute(RdfRepository.java:296)
        at org.wikidata.query.rdf.tool.rdf.RdfRepository.updateLeftOffTime(RdfRepository.java:255)
        at org.wikidata.query.rdf.tool.Update.run(Update.java:235)
        at org.wikidata.query.rdf.tool.Update.main(Update.java:100)

Event Timeline

Smalyshev raised the priority of this task from to High.
Smalyshev updated the task description. (Show Details)

The problem seems to be persistent and not fixed by restart. Looks like permanent data corruption :(

Noted. Can you replicate the situation leading up to this event? Are you doing anything that goes around the REST API? Can you publish a compressed copy of the journal file that we can download or expose the machine for ssh access?

Thanks,
Bryan

The issue is an address in the free list that can not be recycle. You should be able to open in a read only mode based on that stack trace.

@Thompsonbry.systap Leading to this we had updater tool (which fetches updated entities from Wikidata) run for a week or so, loading lots of entities. It all goes through the REST endpoint http://localhost:9999/bigdata/namespace/wdq/sparql

The journal file is 59G by now, not sure how much it is compressed, will try to compress and see.

As for ssh access, not sure - it's on internal labs machine, need to check with Powers That Be what is the policy about it.

Let's boil this down to something that can be replicated by a known series of actions with known data sets. The last time we had something like this it was related to a failure to correctly rollback an UPDATE at the REST API layer. So we need to understand whether this is against a clean build, what release, and what properties are in effect.

Thanks,
Bryan

@Thompsonbry.systap could you add more details about the read-only mode? Is it going to help to fix the database? We'll need it writable since it is constantly updated...

ok. we will need access one way or another.

Please start by running the DumpJournal utility class (in the same
package as the Journal). Specify the -pages option. Post the output.

And please get us a copy of the logs.

Thanks,

Bryan

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

I am only suggesting that you can probably open the database in a
read-only mode and access the data. The "problem" is in the list of
deferred allocation slot addresses for recycling. See the Options
file in the Journal package for how to force a read-only open mode.

Bryan

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

@Thompsonbry.systap replicating it from zero can be problematic... :(

The code running is release bigdata-1.5.1.jar with our extensions:

com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.DefaultBigdataVocabulary
com.bigdata.rdf.store.AbstractTripleStore.extensionFactoryClass=org.wikidata.query.rdf.blazegraph.WikibaseExtensionFactory

This DB has ~3 weeks of updates (that was the test case, see how it behaves in production-like run), with random timeouts, failures, etc. in between. Which may have caused rollbacks. The question is - a) what we do with the existing DB now and b) how I could help investigating this. I'll try to get the .jnl somewhere downloadable (it is huge though), any ideas besides that?

Config:


Log: (sorry, not too much there)

Was group commit enabled?

I suggest that we run this long running test in an environment that
where we have shared access. Martyn (per the email to you and nik)
will be the point person looking at the RWStore, but this could easily
be a failure triggered by back rollback code.

Please send the Java code for your extension. This should not impact
the rollback semantics, but we will need the class in order to open
the namespace as a triple store.

The DumpJournal is a read-only utility. It needs exclusive access so
the journal can not be online when running this utility. The -pages
option will force a full scan of all indices. That will tell us if
there are any problems reading on the indices.

I suggest a hangout to talk this through. Did you get my email?


Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

That log file is only since the journal restart. Do you have the historical log files?

We can probably force the RWStore to discard the list of deferred deletes. Assuming that the problem is purely in that deferred deletes list, this would let you continue to apply updates.

However, what we should do is use this as an opportunity to root cause the issue and fix the underlying problem.

This ticket was a problem with a similar symptom: http://trac.bigdata.com/ticket/1026

The root cause was a problem in how we handled a SPARQL UPDATE failure. The ticket includes the fix for that. However, we refactored the REST API significantly for 1.5.1 (to support group commit). There is a non-zero chance that we re-introduced a similar problem either for SPARQL UPDATE or for a different REST mutation method.

It would be very helpful to understand what was the last successfully applied mutation, what kind of operation it was, whether there have been failed mutation operations, and which operations had those failures.

@Thompsonbry.systap no, unfortunately I didn't keep the log file from previous blazegraph run :( it printed to console but turns out I forgot to redirect it to the log, sorry. Will do DumpJournal now. Is it supposed to take a long time? It dumped some data and then sits there for ~10 mins without doing anything else.

The source is at https://github.com/wikimedia/wikidata-query-rdf - however the one running was not current but from 3 weeks ago, so I'd use something like https://github.com/wikimedia/wikidata-query-rdf/tree/9d0b0e6263ac36bc3402709ccca5333bc1b152ac/
The specific class used as extension factory is https://github.com/wikimedia/wikidata-query-rdf/blob/9d0b0e6263ac36bc3402709ccca5333bc1b152ac/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/WikibaseExtensionFactory.java.

It will take a long time with the -pages option. It is doing a
sequential scan of each index. You will see the output update each
time it finishes with one index and starts on the next. At the end it
will write out a table with statistics about the indices. It should
also write out the allocation information at the start of the dump.

We should add something to scan the deferred deletes list for the
RWStore. The utility is mostly written to a higher level (backend
agnostic) but we could add logic that was backend specific.

Can you reconstruct what was happening before the last restart? What
kinds of update, fail, restart scenario it might have been subject to
recently and what REST API methods would have been involved?

Was group commit enabled?

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

You should be able to see the IOs as it pages through the indices if you are concerned that it might not be making progress. It is probably just busy scanning the disk

Also, here's how we run the Blazegraph for full reproduction:

Run script:


JAR files:

These together with RWStore.properties above should completely cover all the code.

What JVM (version and vendor) and OS?

What happened before the failure was just regular loading of updates (i.e. 10 threads sending mix of INSERT/DELETE statements though REST API). I was checking on it from time to time, and last time I checked, instead of regular update flow I saw a stream of failure exceptions, like the above. Unfortunately, since it's an experimental setup we don't have log storage, etc. set up, so no record of events immediately preceding the failures was preserved :(

We can do a hangout if you'd like, my email is smalyshev@wikimedia.org - any time today till ~6pm PDT works for me.

What JVM (version and vendor) and OS?

Linux db01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux

java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

Ok. We have much less experience with OpenJDK in production. Normally
people deploy the Oracle JVM. I am not aware of any specific
problems. It is just that I do not have as much confidence around
OpenJDK deployments. But this looks like an internals bug, not a JVM

bug.

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

I think this: https://wikitech.wikimedia.org/wiki/Help:Getting_Started can get you set up for shell access to the test machine in question.

What JVM (version and vendor) and OS?

Linux db01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) x86_64 GNU/Linux

java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-2)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

For reference we're 100% wedded to OpenJDK and a bit stuck on Java 7 because its what Debian supports at the moment. I've started making grumblings about getting us to 8.

ok. like I said, no known issues.

Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://blazegraph.com
http://blog.bigdata.com
http://mapgraph.io

Blazegraph™ is our ultra high-performance graph database that supports
both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our
disruptive new technology to use GPUs to accelerate data-parallel
graph analytics.

CONFIDENTIALITY NOTICE: This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

As for ssh access, not sure - it's on internal labs machine, need to check with Powers That Be what is the policy about it.

Anyone who help with stuff like this can get access to labs: Signup on https://wikitech.wikimedia.org/ then wait till your shell access is granted by a human. The resulting user can then be given access to the labs project.

Stas notes that they had recently deleted a large namespace (one of two) from the workbench GUI. We should check the deferred frees in depth and make sure that the deleted namespace deletes are no longer on the deferred deletes list. Stas also notes that some error message was reported in the GUI for this operation.

Since dumping journal takes a lot of time, here's partial one:

- will update when it's done.

Dump script failed with:

java.lang.UnsupportedOperationException
        at com.bigdata.btree.Checkpoint.getHeight(Checkpoint.java:124)
        at com.bigdata.btree.BaseIndexStats.<init>(BaseIndexStats.java:101)
        at com.bigdata.bop.solutions.SolutionSetStream.dumpPages(SolutionSetStream.java:464)
        at com.bigdata.journal.DumpJournal.dumpNamedIndicesMetadata(DumpJournal.java:763)
        at com.bigdata.journal.DumpJournal.dumpJournal(DumpJournal.java:616)
        at com.bigdata.journal.DumpJournal.main(DumpJournal.java:286)
Error: java.lang.UnsupportedOperationException on file: blazegraph.jnl

The output before the exception is:

I have created a ticket for this. See http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). Please subscribe for updates.

Thanks,
Bryan

Can I get the full stack trace for that failure? It looks like it does not understand the type of index. I would like to figure out which index has this problem.

Try DumpJournal without the -pages option. It will run through the indices very quickly. Let me know if you can identify which index has this problem.

See below for the code that threw the exception in Checkpoint.java

    public final int getHeight() {

		switch (indexType) {
		case BTree:
			return height;
		default:
			throw new UnsupportedOperationException(); // The exception is thrown here.
		}
        
    }

The problem might be an issue with DumpJournal and the SolutionSetStream class. The last thing that it visited was a SolutionSetStream, not a BTree.

Can you modify the code to test the index type (interface test) and skip things that are not BTree classes and then re-do the DumpJournal Do first without -pages and then with it.

The change would be to this line:

at com.bigdata.journal.DumpJournal.dumpNamedIndicesMetadata(DumpJournal.java:763)

Change this:

{

    final BaseIndexStats stats = ndx.dumpPages(dumpPages);

to this

if(ndx instanceof BTree) {

          final BaseIndexStats stats = ndx.dumpPages(dumpPages);

here is the last thing dumped - a SolutionSetStream
name=wdq.solutionSets.%s1
Checkpoint{indexType=Stream,nnodes=0,nleaves=0,nentries=3093,counter=0,addrRoot=-283343519412625538,addrMetadata=-283389196389841655,addrBloomFilter=-283262408955264007,addrCheckpoint=-36326833389348}
addrMetadata=0, name=wdq.solutionSets.%s1, indexType=Stream, indexUUID=e57227f1-58f7-4efa-bce2-10b2be799041, branchingFactor=32, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@40d672c5{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.btree.DefaultTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@3f0904c4{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.CanonicalHuffmanRabaCoder@69180030}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}, streamClassName=com.bigdata.bop.solutions.SolutionSetStream, streamCompressionType=Zip

Journal dump with no pages:

It looks like it hit the same error. At least, it has not listed out the
rest of the indices and stops on that solution set stream. We will need to
redo this with the patch to DumpJournal.

@Thompsonbry.systap please check again, I've uploaded wrong file at first, fixed since.

Ok, i see the same number of entries in the forward and reverse indices and
across the various statement indices,

dumpJournal with pages:

Should we also create another bug for DumpJournal being unable to handle SolutionSetStream?

Martyn and I talked this through this morning. He is proceeding per http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1). His first step will be to understand the problem in terms of the RWStore internal state so we can generate some theories about how the specific situation was able to manifest. We are also planning to create an RWStore specific utility for extracting information about the allocators, deferred free list, etc. This can then be hooked into the DumpJournal utility to provide more information.

Can you please run DumpJournal -pages with the modified version of DumpJournal? I would like to verify that the data are consistent in more depth (all index pages can be read).

Thanks,
Bryan

Our plan is to implement a utility that overwrites the address of the deferred free list. So you would run this utility. It would overwrite the address as 0L so that the deferred free list is empty. You would then re-open the journal normally.

The head of the deferred frees list is stored in the ICommitRecord. It is accessed by the RWStore in checkDeferredFrees().

final long blockAddr = record
        .getRootAddr(AbstractJournal.DELETEBLOCK);

if (blockAddr != 0) {

    totalFreed += freeDeferrals(blockAddr,
            record.getTimestamp());
    
}

The proposal is simply to overwrite the root addr for the delete blocks to zero in the utility class for each commit point retained in the commit record index. This will allow the normal processing to not recycle those allocations.

Note: The exception indicates that the correspondence between the allocators when the deferred frees list was written and then current committed allocators is broken. Thus we must not release the slots associated with the deferred frees in this situation because it is possible that they have been allocated to data committed since the point at which the problem occurred and thus are not available for recycling.

We will continue to track down the root cause. We are still discussing some possible scenarios that could lead to this outcome. They all involved a failure to undergo a correct abort at some point followed by successfully applied writes and another commit. We are looking for more mechanisms to allow us to lock out such potential problems at the AbstractJournal layer.

One of the questions I had is why would #1021 not have caught this problem.

So, #1021 only protects against a call to abort() that fails. If there is a write set and

  • (a) a failed commit where the code does not then go on to invoke abort()

-or-

(b) the code simply leaves the scope in which the write set was applied without either committing or aborting then we can have a situation in which buffered writes eventually become durable.

In #1026, the situation was (a). The mutations were not correctly discarded since abort() was not called. This lead to the same symptoms as per this ticket.

We will look for ways to improve protection against this circumstance and test coverage around such abnormal code paths.

See #1206 SPARQL UPDATE with runtime errors causes problems with lexicon indices (similar symptom)
See #1021 Add critical section protection to AbstractJournal?.abort() and BigdataSailConnection?.rollback()

Where can I check the new code out? The latest commit I see on https://sourceforge.net/p/bigdata/git/ci/master/tree/ in master is from April 15...

Can you apply the patch to 1.5.1? The changes that I made are
committed against the 1.5.2 development branch. The change to get
past that UnsupportedOperationException thrown from
Checkpoint.getHeight() is just to return the height field. That is:

public final int getHeight() {

   return height;

}

Once we get to a root cause on the source of the original exception we
can look at patch to 1.5.1, but we are not there yet - this is what I
am looking into now. In the meantime, I do need to get the
DumpJournal -pages output to verify that the indices can be fully
scanned. Martyn will work on the utility to get past the exception
(by overwrite of some entries in the commit record index). He plans
to have this for our morning tomorrow. Note that there is no
guarantee that you can roll forward successfully after this step. I
think it is pretty likely, but we will not know until we actually try
it. The thing that could go wrong is if some slots were incorrectly
recycled (the same slot size, but the wrong allocator state). If this
happens, we would actually wind up releasing data that should not have
been recycled. This is the sort of thing that could show up in the
DumpJournal -pages scan.

Thanks,
Bryan

OK, I hand-patched the source, the output without -pages is here:

. Will add one with -pages as soon as it's ready.

I've pushed the change to Checkpoint.getHeight() to SF git as branch TICKET_1228.

Commit 44ed00d6fd348b2be9bea73f9b8a8c646744a6fd.

See http://trac.bigdata.com/ticket/1228 for some further thoughts on a root cause, some thoughts on how to create a stress test to replicate the problem, and some thoughts on how to fix this.

At this point I think we should focus on tests that interrupt the CreateKB or DestroyKB tasks (perhaps with other tasks to write on the existing namespaces per the tests referenced on that ticket). Those tests should be run both with group commit enabled and with it disabled. I suspect that the problem might only exist when group commit is NOT in use (and that it is due to the refactor to support group commit). The tests need to prompt abnormal failures in order to uncover incorrect abort handling logic.

Once we can trigger the problem we can find ways to patch it in 1.5.1. I would like to change how this is handled in 1.5.2 in order to make such problems significantly less likely - effectively by building in some of the protections of the group commit code paths into the non-group commit code path.

Full dumpJournal output with pages:

@Thompsonbry.systap looks like using the utility fixes the journal at least to the point it now accepts the updates again. Thanks a lot for the quick turnaround!

New exception now:

ERROR: DefaultResourceLocator.java:531: java.lang.IllegalArgumentException: fromKey > toKey
java.lang.IllegalArgumentException: fromKey > toKey
        at com.bigdata.btree.ChildIterator.<init>(ChildIterator.java:124)
        at com.bigdata.btree.Node.childIterator(Node.java:3384)
        at com.bigdata.btree.Node.postOrderIterator2(Node.java:3047)
        at com.bigdata.btree.Node.access$100(Node.java:86)
        at com.bigdata.btree.Node$2.expand(Node.java:3073)
        at cutthecrap.utils.striterators.Expanderator.getNext(Expanderator.java:59)
        at cutthecrap.utils.striterators.Prefetch.checkInit(Prefetch.java:27)
        at cutthecrap.utils.striterators.Prefetch.hasNext(Prefetch.java:35)
        at cutthecrap.utils.striterators.Striterator.hasNext(Striterator.java:89)
        at cutthecrap.utils.striterators.Appenderator.getNext(Appenderator.java:47)
        at cutthecrap.utils.striterators.Prefetch.checkInit(Prefetch.java:27)
        at cutthecrap.utils.striterators.Prefetch.hasNext(Prefetch.java:35)
        at cutthecrap.utils.striterators.Striterator.hasNext(Striterator.java:89)
        at cutthecrap.utils.striterators.Expanderator.getNext(Expanderator.java:54)
        at cutthecrap.utils.striterators.Prefetch.checkInit(Prefetch.java:27)
        at cutthecrap.utils.striterators.Prefetch.hasNext(Prefetch.java:35)
        at cutthecrap.utils.striterators.Striterator.hasNext(Striterator.java:89)
        at com.bigdata.btree.AbstractNode$PostOrderEntryIterator.hasNext(AbstractNode.java:654)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:250)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:157)
        at com.bigdata.sparse.AtomicRowRead.apply(AtomicRowRead.java:98)
        at com.bigdata.sparse.AtomicRowRead.apply(AtomicRowRead.java:36)
        at com.bigdata.btree.AbstractBTree.submit(AbstractBTree.java:3184)
        at com.bigdata.sparse.SparseRowStore.read(SparseRowStore.java:531)
        at com.bigdata.sparse.SparseRowStore.read(SparseRowStore.java:414)
        at com.bigdata.relation.locator.DefaultResourceLocator.locateResourceOn(DefaultResourceLocator.java:778)
        at com.bigdata.relation.locator.DefaultResourceLocator.locateResource(DefaultResourceLocator.java:499)
        at com.bigdata.relation.locator.DefaultResourceLocator.cacheMiss(DefaultResourceLocator.java:383)
        at com.bigdata.relation.locator.DefaultResourceLocator.locate(DefaultResourceLocator.java:335)
        at com.bigdata.rdf.store.AbstractTripleStore.getSPORelation(AbstractTripleStore.java:2002)
        at com.bigdata.rdf.store.AbstractTripleStore.getAccessPath(AbstractTripleStore.java:3138)
        at com.bigdata.rdf.sparql.ast.optimizers.ASTRangeCountOptimizer.estimateCardinalities(ASTRangeCountOptimizer.java:199)
        at com.bigdata.rdf.sparql.ast.optimizers.ASTRangeCountOptimizer.estimateCardinality(ASTRangeCountOptimizer.java:191)
        at com.bigdata.rdf.sparql.ast.optimizers.ASTRangeCountOptimizer$RangeCountTask.call(ASTRangeCountOptimizer.java:171)
        at com.bigdata.rdf.sparql.ast.optimizers.ASTRangeCountOptimizer$RangeCountTask.call(ASTRangeCountOptimizer.java:154)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Not sure. There is really no reliable way to carry the database forward
from an error like the one that triggered originally. This might be
something we didn't think through in the utility to unblock the journal. Or
it might be bad data in the indices from the original problem.

We should focus on a test to reproduce this problem and a fix.

Probable root cause remains that drop namespace operation.

Bryan

Can you also include the operation that was executed for this error?

Seems to be transient error, other updates work fine. We don't have yet the logging of SPARQL statements sent, I'll add it now.

Your branching factors are all set to 128 in this namespace. You should override them per the results I had posted on a different ticket to target 8k pages with few blobs (pages that exceed 8k). Just FYI. This can be done when the namespace is created.

We have migrated to JIRA. See http://jira.blazegraph.com/browse/BLZG-1236 for the ticket that corresponds to http://trac.bigdata.com/ticket/1228 (Recycler error in 1.5.1).

We have not yet been able to reproduce this. I have developed and I am continuing to extend a stress test around the REST API operations. Martyn is looking at some code paths that might be implicated in that end state.