Page MenuHomePhabricator

System OOM causes random mmap() failure rather than oom-killer
Open, Needs TriagePublic

Description

In T297517 it was observed that excessive system memory usage due to a PHP memory leak led to kernel allocation failures. oom-killer was apparently not invoked. This is not an acceptable failure mode because random running processes are affected (we saw failures in gpg-agent and prometheus). The processes with high memory usage, which caused the system issue, were left to continue running. Even when a culprit process received an mmap() failure, it did not respond by freeing a significant amount of memory. A PHP fatal error was delivered to the user but the worker process was not restarted.

  • Confirm the issue by generating a system OOM
  • Increase vm.min_free_kbytes in soft state
  • Verify correct OOM behaviour
  • Puppetize