**Symptoms**:
* Gerrit is not responsive
* Thread count skyrocketted
* Monitoring probe `Gerrit Health Check on gerrit.wikimedia.org` is CRITICAL
Most probably indicate that HTTP threads are all deadlocked waiting for a lock that is never release. An obvious symptom is a `SendEmail` task being stuck (from `ssh -p 29418 gerrit.wikimedia.org gerrit show-queue -w -q`.
#upstream tasks:
* Gerrit: https://bugs.chromium.org/p/gerrit/issues/detail?id=7645
* Guava (com.google.common.cache): https://github.com/google/guava/issues/3602
**Workaround**:
Restart Gerrit :-\
ssh cobalt.wikimedia.org sudo systemctl restart gerrit
And log:
!log Restarting Gerrit T224448
----
Gerrit threads have been getting stuck behind a single thread `SendEmail`
{F29260640}
(see the threaddump: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDUvMjcvLS1qc3RhY2stMTktMDUtMjctMjItNTItNTIuZHVtcC0tMjItNTMtMjA=)
The only way to resolve this issue is to restart Gerrit.
This issue is superficially similar to T131189; however:
1. send-email doesn't show up in `gerrit show-queue -w --by-queue` (tried using various flags)
2. `lsof` for the gerrit process at the time of these problems didn't show any smtp connections
I've tried killing the offending thread via the JavaMelody interface to no avail.
Ongoing upstream discussion: https://groups.google.com/forum/#!msg/repo-discuss/pBMh09-XJsw/vuhDiuTWAAAJ
----
Summary by @hashar
```
- parking to wait for <0x00000006d75ed7e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2089)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
at com.google.gerrit.server.account.AccountCacheImpl.get(AccountCacheImpl.java:85)
at com.google.gerrit.server.account.InternalAccountDirectory.fillAccountInfo(InternalAccountDirectory.java:69)
at com.google.gerrit.server.account.AccountLoader.fill(AccountLoader.java:91)
at com.google.gerrit.server.change.ChangeJson.formatQueryResults(ChangeJson.java:
```
That is still in the local account cache, it might well just be a nasty deadlock bug in com.google.common.cache / Guava. Or the SendEmail thread has some bug and sometime fails to release the local account cache lock :-\ Which bring us back to #upstream bug https://bugs.chromium.org/p/gerrit/issues/detail?id=7645
it had a patch https://gerrit-review.googlesource.com/c/gerrit/+/154130/ but that got reverted https://gerrit-review.googlesource.com/c/gerrit/+/162870/ :-\