consider running bastion Prometheis inside cgroups
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	CDanis
	Jun 27 2019, 8:42 PM

Description

Today on bast3002, Prometheus did something (possibly a very large query?) to get oom-killed

Jun 27 20:05:32 bast3002 kernel: [812018.018273] Killed process 3745 (prometheus) total-vm:50336320kB, anon-rss:7298804kB, file-rss:0kB, shmem-rss:0kB

In the meanwhile load on bast3002 was 120+ and it was unusable as a bastion.

Maybe we should run Prometheus on these machines inside a memory-limited cgroup? The limit should still be quite high, since Prometheus will need a lot of RAM sometime, but getting the into machine-OOM-killer state isn't ideal.

Event Timeline

CDanis created this task.Jun 27 2019, 8:42 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 27 2019, 8:42 PM

I'm told the plan is to move these onto Ganeti in PoPs, so that seems just as good.

faidon renamed this task from consider running bastion Prometheis inside cgroups to consider running bastion Prometheus inside cgroups.Jun 27 2019, 10:00 PM

faidon renamed this task from consider running bastion Prometheus inside cgroups to consider running bastion Prometheis inside cgroups.Jun 27 2019, 10:03 PM

Agreed, memory limiting in the interim while Ganeti is being setup sounds good to me.

consider running bastion Prometheis inside cgroupsClosed, InvalidPublicActions

Description

Event Timeline

consider running bastion Prometheis inside cgroups
Closed, InvalidPublic
Actions