Page MenuHomePhabricator

extdist-03.extdist.eqiad.wmflabs has high user/system CPU
Closed, ResolvedPublic

Description

extdist-03.extdist.eqiad.wmflabs instance has very high system and user CPU. Most probably there is a wild process on it

https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?panelId=45&fullscreen&orgId=1&var-project=extdist&var-server=All&from=now-30d&to=now

View for the last ~ 5 months shows it started beginning of July.

extdist-03_highcpu.png (343×799 px, 33 KB)

It would be great to investigate what is going on and kill / clean up the process. That will free up some CPU for others :]

Event Timeline

hashar renamed this task from extdist-03.extdist.eqiad.wmflabs has high user/system CPU sinc to extdist-03.extdist.eqiad.wmflabs has high user/system CPU.Oct 31 2017, 12:57 PM

Woah.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                  
18545 root      20   0  284868  56128    808 R  26.2  1.4  18207:42 puppet                                                                                                   
28222 extdist   20   0  350536  56736   1332 R  26.2  1.4  23275:42 puppet                                                                                                   
30529 extdist   20   0  351516  58132   2772 R  26.2  1.4  17394:21 puppet                                                                                                   
 1722 root      20   0  366952  71512    784 R  19.9  1.8  71048:39 puppet                                                                                                   
 7158 root      20   0  284840  57468   2112 R  19.9  1.4  17879:46 puppet                                                                                                   
27122 root      20   0  350616  56172    720 R  19.9  1.4  24453:15 puppet                                                                                                   
27933 extdist   20   0  368088  71040   1192 R  19.9  1.8  68246:56 puppet                                                                                                   
25325 root      20   0  364336  70968    904 R  19.6  1.8  48309:22 puppet

wtf?

I kill -9'd all of the random puppet processes. But how could the extdist user be running puppet? I'm really confused as to what happened.

Legoktm claimed this task.

The CPU spikes that are there now are expected and are caused by extdist (big spikes) and skindist (small spikes).