Page MenuHomePhabricator

Toolforce ulimit insufficient for normal work
Open, Needs TriagePublicBUG REPORT

Description

I'm trying to set up wikibugs on k8s, and have the following open:

  • One ssh session running webservice shell;
  • One ssh session to call toolforge-jobs / kubectl commands
  • One ssh session to view log files

However kubectl calls regularly give the following error:

runtime: failed to create new OS thread (have 42 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

This is on tools-sgebastion-10.
I don't know if kubectl uses an absurd amount of threads, but as-is the ulimit setting is too low.

Event Timeline

kubectl/golang does consume a ridiculous amount of threads which then trips over the limits that we have established on the bastions for per user (not per connection, but per user) resource consumption. There is a note at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#%22failed_to_create_new_OS_thread%22_from_kubectl on how one can use GOMAXPROCS=1 in the environment to tame the thread consumption of kubectl when needed. Actively running 3 shells is very likely to eat up your user's quota however even with go being a bit less greedy.

Brooke and Arturo went through several rounds of tuning the systemd limits that are applied on the bastions after introducing them over 2 years ago. Your report is not the only one we have ever had asking for more space, but there has been by no means a large number of complains either. This is anecdote rather than empirical fact, but I would place the 3-5 folks I can think of who have come looking for answers about this issue in the last 2 years in a "power user" category. On the flip side of this, the number of complaints about general slowness on the bastions and alerts for completely locked up bastions has functionally dropped to zero with the limits in place.

The main work around today other than GOMAXPROCS=1 is to use both the login.toolforge.org and dev.toolforge.org bastions. The limits are per user per host, so this effectively doubles your ability to spawn processes on the bastions.