In T297517 it was observed that excessive system memory usage due to a PHP memory leak led to kernel allocation failures. oom-killer was apparently not invoked. This is not an acceptable failure mode because random running processes are affected (we saw failures in gpg-agent and prometheus). The processes with high memory usage, which caused the system issue, were left to continue running. Even when a culprit process received an mmap() failure, it did not respond by freeing a significant amount of memory. A PHP fatal error was delivered to the user but the worker process was not restarted.
- Confirm the issue by generating a system OOM
- Increase vm.min_free_kbytes in soft state
- Verify correct OOM behaviour
- Puppetize