Page MenuHomePhabricator

Record OOM kills as a metric with mtail
Closed, DeclinedPublic

Event Timeline

Some analysis from syslog on thumbor machine for oom-kills

thumbor1001# zgrep -h 'thumbor invoked oom-killer' syslog.7.gz syslog.6.gz syslog.5.gz syslog.4.gz syslog.3.gz syslog.2.gz syslog.1 syslog  | uniq -w6 -c  | less -FSRX
    117 Nov  3 06:59:03 thumbor1001 kernel: [1098077.719957] thumbor invoked oom-killer: 
    126 Nov  4 00:12:34 thumbor1001 kernel: [1160089.127066] thumbor invoked oom-killer: 
    278 Nov  5 00:01:16 thumbor1001 kernel: [1245811.288420] thumbor invoked oom-killer: 
    357 Nov  6 00:15:37 thumbor1001 kernel: [1333072.682151] thumbor invoked oom-killer: 
    130 Nov  7 00:00:11 thumbor1001 kernel: [1418547.649780] thumbor invoked oom-killer: 
    214 Nov  8 00:16:16 thumbor1001 kernel: [1505913.256708] thumbor invoked oom-killer: 
    154 Nov  9 00:04:21 thumbor1001 kernel: [1591598.241859] thumbor invoked oom-killer: 
    156 Nov 10 00:01:10 thumbor1001 kernel: [1677808.183941] thumbor invoked oom-killer:
thumbor1002# zgrep -h 'thumbor invoked oom-killer' syslog.7.gz syslog.6.gz syslog.5.gz syslog.4.gz syslog.3.gz syslog.2.gz syslog.1 syslog  | uniq -w6 -c  | less -FSRX
    127 Nov  3 06:56:34 thumbor1002 kernel: [1097581.656841] thumbor invoked oom-killer: 
    121 Nov  4 00:06:51 thumbor1002 kernel: [1159393.526557] thumbor invoked oom-killer: 
    240 Nov  5 00:05:39 thumbor1002 kernel: [1245714.152180] thumbor invoked oom-killer: 
    342 Nov  6 00:23:37 thumbor1002 kernel: [1333185.459516] thumbor invoked oom-killer: 
    129 Nov  7 00:08:29 thumbor1002 kernel: [1418670.596597] thumbor invoked oom-killer: 
    192 Nov  8 00:15:38 thumbor1002 kernel: [1505492.501601] thumbor invoked oom-killer: 
    153 Nov  9 00:06:34 thumbor1002 kernel: [1591341.463592] thumbor invoked oom-killer: 
    151 Nov 10 00:00:13 thumbor1002 kernel: [1677353.633742] thumbor invoked oom-killer:

So, no change after the IM limits were introduced? Maybe the difference between 900M and 1G isn't enough. I should check how much memory Thumbor consumes when it's idle.

There's still the possibility of significant memory leaks. I guess it would be nice to be able to graph the memory consumption of each Thumbor process over time.

Gilles lowered the priority of this task from Medium to Low.Nov 15 2016, 12:30 PM

Mtail is too unstable at the moment.