Page MenuHomePhabricator

Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files")
Closed, ResolvedPublic

Description

Hi, when I try to view other users changes, I am now getting a 500.

I carnt view my dashboard either and the whole site is throwing 500 for me.

Details

Related Gerrit Patches:
operations/puppet : productionGerrit: Raise git_open_file to 20000

Event Timeline

Paladox created this task.Jun 20 2017, 8:28 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 20 2017, 8:28 AM
Paladox triaged this task as Unbreak Now! priority.Jun 20 2017, 8:28 AM

Setting unbreak as I carnt seem to do anything now.

Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptJun 20 2017, 8:28 AM
Joe claimed this task.Jun 20 2017, 8:29 AM
Caused by: java.nio.file.FileSystemException: .../_1ut5m.nvm: Too many open files

I'm fixing by hand the systemd unit and restarting gerrit.

Mentioned in SAL (#wikimedia-operations) [2017-06-20T08:30:19Z] <_joe_> restarting gerrit T168360

phuedx added a subscriber: phuedx.Jun 20 2017, 8:33 AM
Joe added a comment.Jun 20 2017, 8:35 AM

So what I did:

  • raised the LimitNOFile to 60000 manually
  • didn't bother with all the other ulimits that the shell script tries to set
  • restarted gerrit

-disabled puppet

Now lowering the priority to high and leaving this to who worked on converting this to systemd.

Joe removed Joe as the assignee of this task.Jun 20 2017, 8:35 AM
Joe lowered the priority of this task from Unbreak Now! to High.
Joe added subscribers: Dzahn, Joe.
Aklapper renamed this task from Gerrit is now constently throwing 500 for me when reviewing patches to Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files").Jun 20 2017, 9:21 AM

Ah, systemd caused this? Oh I thought gerrit was started by the init script as we had problems with systemd before.

Also it was me who worked on systemd.

Change 360312 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/debs/gerrit@master] Fix systemd script to use a higher LimitNOFile value

https://gerrit.wikimedia.org/r/360312

@Joe thanks for fixing this :)

phuedx removed a subscriber: phuedx.Jun 20 2017, 12:51 PM

Mentioned in SAL (#wikimedia-operations) [2017-06-20T17:52:59Z] <mutante> cobalt (gerrit) - re-enabling puppet, running it. nothing should change, the system unit file mentioned in T168360#3362314 does not get installed by puppet, it comes from the deb

Dzahn added a comment.Jun 20 2017, 5:59 PM

Thanks joe! So.. that systemd unit file is installed from the .deb, not by puppet. I said we should remove it and not ship it until/unless we also switch puppet to use systemd and it has had more testing as this is like half-converting it and it causes confusion.

I re-enabled puppet because that doesn't revert the manual change. And for now we still use /etc/init.d/gerrit and it works as before.

Interesting part is that there are also ulimit lines in there but either that's not the same limit or it wasn't actually working?

ulimit -n $GERRIT_FDS ; # open files

Dzahn added a comment.Jun 20 2017, 6:04 PM

I would say let's delete the unit file that is installed now, remove it from package the next time we build a new version anyways.. and instead let puppet install it and actually set service{} or base::service_unit{} to use systemd... but that was my original suggestion and got downvotes in Gerrit. so not sure now...

Well i found the init script adds 1024 with core.packedGitOpenFiles so basically for us it's 1024 + 6000 but in systemd we have one value which is 6000. So the ulimits either never worked in gerrit or they worked but we never reached it due to it adding 1024 ontop.

So i say put the limit a lot higher in systemd. something like 20000 or 30000. (future proofs things but try to prevent the problem happening.)

demon added a comment.Jun 21 2017, 5:20 AM

Hmm, all this started after we tried swapping SysV init for systemd. Funny how that correlates 🤔 😏

Dzahn added a comment.EditedJun 21 2017, 5:23 AM

Better to raise it (https://gerrit.wikimedia.org/r/#/c/360312/) than not raise it. I am happy to build the new deb and upload it if that is the easy route for right now. But also still https://gerrit.wikimedia.org/r/#/c/356516/ ..

According to bin/gerrit.sh status

this is what the init script has

GERRIT_FDS             =  12000

(this is set from puppet so it doubles the value which is why we did not see the problem. Systemd does not allow us to double the value like you can in the init script) so we have to set it to a fix number. 20000 seems better as it future proofs things.

Change 360312 merged by Dzahn:
[operations/debs/gerrit@master] Fix systemd script to use a higher LimitNOFile value

https://gerrit.wikimedia.org/r/360312

Dzahn added a comment.Jun 21 2017, 9:27 PM

19:39 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for jessie
21:05 mutante: apt.wm.org - reprepro, include gerrit_2.13.8+git1-wmf.6 for jessie-wikimedia

The new version including this fix can be installed now.

Mentioned in SAL (#wikimedia-operations) [2017-06-21T21:40:01Z] <RainbowSprinkles> gerrit2001: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)

Mentioned in SAL (#wikimedia-operations) [2017-06-21T21:44:16Z] <RainbowSprinkles> cobalt: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)

demon closed this task as Resolved.Jun 21 2017, 9:47 PM
demon claimed this task.

Change 384617 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Gerrit: Raise git_open_file to 20000

https://gerrit.wikimedia.org/r/384617

Change 384617 merged by Dzahn:
[operations/puppet@production] Gerrit: Raise git_open_file to 20000

https://gerrit.wikimedia.org/r/384617