Page MenuHomePhabricator

API apache servers OOMing: mw1134 mw1132 mw1139 mw1138
Closed, DuplicatePublic

Description

This week four different app servers have OOMd and killed random processes:

mw1134
mw1132
mw1139
mw1138

I have rebooted all but 1138. 1138 I've depooled and left intact for investigation.

Event Timeline

Andrew renamed this task from apache servers OOMing: mw1134 mw1132 mw1139 mw1138 to API apache servers OOMing: mw1134 mw1132 mw1139 mw1138.Apr 16 2016, 2:36 PM
Andrew triaged this task as Unbreak Now! priority.

@Andrew mw1138 is not depooled (anymore), its CPU and network graphs show it is serving traffic. Looking at http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=mw1132.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=network_report&c=API+application+servers+eqiad it was idle for a few hours, but then it somehow got repooled..

Andrew lowered the priority of this task from Unbreak Now! to Medium.Apr 18 2016, 1:53 PM