Apparently a deadlock inside blazegraph itself:
Found one Java-level deadlock: ============================= "GASEngine4": waiting for ownable synchronizer 0x00007fcbf9dbc3c0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "com.bigdata.journal.Journal.executorService1539347" "com.bigdata.journal.Journal.executorService1539347": waiting to lock monitor 0x00007fc555798e18 (object 0x00007fcfda000320, a java.lang.Object), which is held by "GASEngine2" "GASEngine2": waiting to lock monitor 0x00007fc57c22e358 (object 0x00007fcbf9b97710, a java.lang.Object), which is held by "com.bigdata.journal.Journal.executorService1539347"
full stack: P10117
The problem remained unseen by the system, but started around 2020-01-10T15:44.
The machine stopped to handle updates and queries, the lag stopped to be reported as well.
Blazegraph was restarted around 19:44.
Root cause is likely related to the fact that the GAS engine spawns its own thread pool.
Possible workarounds:
- disable the GAS engine service (user impact is high)
- detect this deadlock state and restart the service
AC:
- determine how to workaround this problem, create a new ticket and decline this one as we won't be fixing this bug inside blazegraph