Looking at an unrelated issue (cron spam) I've noticed that the in the kubernetes workers hosts the oom-killer is invoked quite frequently, although not on all of them. Just reporting it in case it's not known.
$ sudo cumin -x 'A:kubernetes-workers' 'dmesg -T | grep -c "oom-killer"' IGNORE EXIT CODES mode enabled, all commands executed will be considered successful 12 hosts will be targeted: kubernetes[2001-2006].codfw.wmnet,kubernetes[1001-1006].eqiad.wmnet Confirm to continue [y/n]? y ===== NODE GROUP ===== (1) kubernetes2004.codfw.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 40 ===== NODE GROUP ===== (1) kubernetes2002.codfw.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 47 ===== NODE GROUP ===== (2) kubernetes[2001,2003].codfw.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 43 ===== NODE GROUP ===== (4) kubernetes[2005-2006].codfw.wmnet,kubernetes[1005-1006].eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 0 ===== NODE GROUP ===== (2) kubernetes[1001,1003].eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 71 ===== NODE GROUP ===== (1) kubernetes1004.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 70 ===== NODE GROUP ===== (1) kubernetes1002.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep -c "oom-killer"' ----- 67 ================ PASS: |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100% (12/12) [00:00<00:00, 13.43hosts/s] FAIL: | | 0% (0/12) [00:00<?, ?hosts/s] 100.0% (12/12) success ratio (>= 100.0% threshold) for command: 'dmesg -T | grep -c "oom-killer"'. 100.0% (12/12) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.