Page MenuHomePhabricator

consider running bastion Prometheis inside cgroups
Closed, InvalidPublic

Description

Today on bast3002, Prometheus did something (possibly a very large query?) to get oom-killed

Jun 27 20:05:32 bast3002 kernel: [812018.018273] Killed process 3745 (prometheus) total-vm:50336320kB, anon-rss:7298804kB, file-rss:0kB, shmem-rss:0kB

In the meanwhile load on bast3002 was 120+ and it was unusable as a bastion.

Maybe we should run Prometheus on these machines inside a memory-limited cgroup? The limit should still be quite high, since Prometheus will need a lot of RAM sometime, but getting the into machine-OOM-killer state isn't ideal.

Event Timeline

I'm told the plan is to move these onto Ganeti in PoPs, so that seems just as good.

faidon renamed this task from consider running bastion Prometheis inside cgroups to consider running bastion Prometheus inside cgroups.Jun 27 2019, 10:00 PM
faidon renamed this task from consider running bastion Prometheus inside cgroups to consider running bastion Prometheis inside cgroups.Jun 27 2019, 10:03 PM

Agreed, memory limiting in the interim while Ganeti is being setup sounds good to me.