Page MenuHomePhabricator

Gerrit internal queue is filled / not processing
Closed, ResolvedPublic

Description

gerrit show-queue shows 110 tasks, most being git-upload-pack related. Started around 11:17 UTC.

Event Timeline

Mentioned in SAL [2016-07-18T12:03:55Z] <hashar> Gerrit was slow processing requests such as git pull since 11:17 UTC . Fixed by killing all idling/waiting tasks T140604

hashar claimed this task.
hashar added a project: WorkType-Maintenance.

I have kill the tasks that were idling and the remaining waiting tasks per gerrit show-queue

Gerrit works reliably now.

Did we get any sort of logs on this prior to killing? It would be useful to know whether these were generated by a batch job of some sort (jenkins, zuul?), a third party bot (translatewiki, labs), just higher traffic than normal or whether it was someone attacking. Was it all the same repo or a bunch of them?

What status were they in? Were they "waiting" because those are fine. Was it all hung up on a single job preventing the later ones from going? Usually you can kill the top hung job and the rest will flow out on their own.

The queue had roughly a hundred of tasks in waiting state. I found out about that because a git pull on operations/puppet stalled.

sshd_log / error_log was not showing anything interesting around or prior to 2016-07-18T11:17:00Z , might need a second look.

I dont have access to Apache logs / syslog etc (simple user). So hard to do any further diag.

One sure thing, killing tasks unblocked it. I am not sure how many tasks it process in parallel but there more than a handful of upload-packs ones that were just idling.