Page MenuHomePhabricator

Add CPU quota to stat and notebook hosts
Closed, ResolvedPublic

Description

Similar to what we did for memory in T212824, we should add a max/reasonable limit to CPU usage for a single user on stat/notebook hosts, to avoid use cases of:

  1. scripts forking a ton of processes / threads
  2. scripts using a ton of CPU ending up in making the host unresponsive.

The goal is to reduce the ops workload and avoid users to impact each other.

Event Timeline

fdans triaged this task as Medium priority.Dec 23 2019, 5:05 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 561675 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::client::limits: add cpu limits to Analytics clients

https://gerrit.wikimedia.org/r/561675

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to In Code Review on the Analytics-Kanban board.

Change 561675 merged by Elukey:
[operations/puppet@production] profile::analytics::client::limits: add cpu limits to Analytics clients

https://gerrit.wikimedia.org/r/561675

Tested on stat1004 (16 cores as displayed by nproc) with a series of dd if=/dev/zero of=/dev/null in pipe. With 20 I can see several processes running, but the avg cpu usage gets around 70/80% max, and the host is usable.

elukey moved this task from In Code Review to Done on the Analytics-Kanban board.

To keep archives happy: the solution above was not enough, we had to do multiple things:

  1. move the limits to user.slice to get applied to all the users logged in at once (even root)
  2. move notebook units under user.slice (they were in the system.slice together with nagios etc..) - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/577320/