We can make stat host resource contention less painful by moving SSH and other critical system processes to their own cgroup, similar to what's described here . This will prevent SSH from becoming unresponsive due to user-created processes, which enables the following:
- Any individual user will be able to login and identify the process or processes causing contention. If it's their process, they can stop it. If it's not, they can ping the user who started the job.
- SREs can login and kill any process instead of rebooting or trying to pick out processes from an extremely unresponsive prompt.
Creating this ticket to:
- Create a critical system process cgroup.
- Verify it works as intended