Page MenuHomePhabricator

Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files")
Closed, ResolvedPublic

Description

Hi, when I try to view other users changes, I am now getting a 500.

I carnt view my dashboard either and the whole site is throwing 500 for me.

Event Timeline

Paladox triaged this task as Unbreak Now! priority.Jun 20 2017, 8:28 AM

Setting unbreak as I carnt seem to do anything now.

Caused by: java.nio.file.FileSystemException: .../_1ut5m.nvm: Too many open files

I'm fixing by hand the systemd unit and restarting gerrit.

So what I did:

  • raised the LimitNOFile to 60000 manually
  • didn't bother with all the other ulimits that the shell script tries to set
  • restarted gerrit

-disabled puppet

Now lowering the priority to high and leaving this to who worked on converting this to systemd.

Joe removed Joe as the assignee of this task.Jun 20 2017, 8:35 AM
Joe lowered the priority of this task from Unbreak Now! to High.
Joe added subscribers: Dzahn, Joe.
Aklapper renamed this task from Gerrit is now constently throwing 500 for me when reviewing patches to Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files").Jun 20 2017, 9:21 AM

Ah, systemd caused this? Oh I thought gerrit was started by the init script as we had problems with systemd before.

Also it was me who worked on systemd.

Change 360312 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/debs/gerrit@master] Fix systemd script to use a higher LimitNOFile value

https://gerrit.wikimedia.org/r/360312

Mentioned in SAL (#wikimedia-operations) [2017-06-20T17:52:59Z] <mutante> cobalt (gerrit) - re-enabling puppet, running it. nothing should change, the system unit file mentioned in T168360#3362314 does not get installed by puppet, it comes from the deb

Thanks joe! So.. that systemd unit file is installed from the .deb, not by puppet. I said we should remove it and not ship it until/unless we also switch puppet to use systemd and it has had more testing as this is like half-converting it and it causes confusion.

I re-enabled puppet because that doesn't revert the manual change. And for now we still use /etc/init.d/gerrit and it works as before.

Interesting part is that there are also ulimit lines in there but either that's not the same limit or it wasn't actually working?

ulimit -n $GERRIT_FDS ; # open files

I would say let's delete the unit file that is installed now, remove it from package the next time we build a new version anyways.. and instead let puppet install it and actually set service{} or base::service_unit{} to use systemd... but that was my original suggestion and got downvotes in Gerrit. so not sure now...

Well i found the init script adds 1024 with core.packedGitOpenFiles so basically for us it's 1024 + 6000 but in systemd we have one value which is 6000. So the ulimits either never worked in gerrit or they worked but we never reached it due to it adding 1024 ontop.

So i say put the limit a lot higher in systemd. something like 20000 or 30000. (future proofs things but try to prevent the problem happening.)

Hmm, all this started after we tried swapping SysV init for systemd. Funny how that correlates 🤔 😏

Better to raise it (https://gerrit.wikimedia.org/r/#/c/360312/) than not raise it. I am happy to build the new deb and upload it if that is the easy route for right now. But also still https://gerrit.wikimedia.org/r/#/c/356516/ ..

According to bin/gerrit.sh status

this is what the init script has

GERRIT_FDS             =  12000

(this is set from puppet so it doubles the value which is why we did not see the problem. Systemd does not allow us to double the value like you can in the init script) so we have to set it to a fix number. 20000 seems better as it future proofs things.

Change 360312 merged by Dzahn:
[operations/debs/gerrit@master] Fix systemd script to use a higher LimitNOFile value

https://gerrit.wikimedia.org/r/360312

19:39 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for jessie
21:05 mutante: apt.wm.org - reprepro, include gerrit_2.13.8+git1-wmf.6 for jessie-wikimedia

The new version including this fix can be installed now.

Mentioned in SAL (#wikimedia-operations) [2017-06-21T21:40:01Z] <RainbowSprinkles> gerrit2001: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)

Mentioned in SAL (#wikimedia-operations) [2017-06-21T21:44:16Z] <RainbowSprinkles> cobalt: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)

demon claimed this task.

Change 384617 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Gerrit: Raise git_open_file to 20000

https://gerrit.wikimedia.org/r/384617

Change 384617 merged by Dzahn:
[operations/puppet@production] Gerrit: Raise git_open_file to 20000

https://gerrit.wikimedia.org/r/384617