Page MenuHomePhabricator

logstash::collector apache high cpu usage
Closed, ResolvedPublic

Description

I noticed Apache is sitting at 100% CPU consistently across the active logstash collector hosts.

Details below pulled from a sample collector host, all are showing the same symptom:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 215821 www-data  20   0 1947972  11476    136 S  99.3   0.1  98555:06 apache2
3787499 www-data  20   0 1950092  25408   3972 S  97.0   0.3   2724:34 apache2

However utilization according to mod status looks pretty light

logstash1025:~# apachectl status
                  Apache Server Status for localhost (via ::1)

   Server Version: Apache/2.4.56 (Debian)

   Server MPM: event

   Server Built: 2023-04-02T03:06:01

     ----------------------------------------------------------------------

   Current Time: Thursday, 02-Nov-2023 14:44:33 UTC

   Restart Time: Thursday, 24-Aug-2023 19:15:26 UTC

   Parent Server Config. Generation: 71

   Parent Server MPM Generation: 70

   Server uptime: 69 days 19 hours 29 minutes 6 seconds

   Server load: 4.50 4.94 4.54

   Total accesses: 2495568 - Total Traffic: 39.4 GB - Total Duration:
   43623082

   CPU Usage: u1105.81 s276.17 cu2234.57 cs1731.77 - .0887% CPU load

   .414 requests/sec - 6.9 kB/second - 16.6 kB/request - 17.4802 ms/request

   1 requests currently being processed, 49 idle workers

   +--------------------------------------------------------------------------+
   |    |       |        |Connections    |Threads  |Async connections         |
   |Slot|PID    |Stopping|---------------+---------+--------------------------|
   |    |       |        |total|accepting|busy|idle|writing|keep-alive|closing|
   |----+-------+--------+-----+---------+----+----+-------+----------+-------|
   |2   |4163628|no      |0    |yes      |0   |25  |0      |0         |0      |
   |----+-------+--------+-----+---------+----+----+-------+----------+-------|
   |3   |3286906|yes (old|25   |no       |0   |0   |0      |0         |0      |
   |    |       |gen)    |     |         |    |    |       |          |       |
   |----+-------+--------+-----+---------+----+----+-------+----------+-------|
   |4   |4163629|no      |1    |yes      |1   |24  |0      |0         |1      |
   |----+-------+--------+-----+---------+----+----+-------+----------+-------|
   |Sum |3      |1       |26   |         |1   |49  |0      |0         |1      |
   +--------------------------------------------------------------------------+

 ..................................................______________
 ___________GGGGGGGGGGGGGGGGGGGGGGGGG_________W_______________...
 ......................

   Scoreboard Key:
   "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
   "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
   "C" Closing connection, "L" Logging, "G" Gracefully finishing,
   "I" Idle cleanup of worker, "." Open slot with no current process

Event Timeline

herron triaged this task as Medium priority.Nov 2 2023, 2:53 PM
herron created this task.

Mentioned in SAL (#wikimedia-operations) [2023-11-02T14:56:17Z] <herron> logstash1025 systemctl restart apache2.service T350402

Mentioned in SAL (#wikimedia-operations) [2023-11-02T14:56:17Z] <herron> logstash1025 systemctl restart apache2.service T350402

After a service restart cpu usage fell to 0.1%, status looks like:

logstash1025:~# apachectl status
                  Apache Server Status for localhost (via ::1)

   Server Version: Apache/2.4.56 (Debian)

   Server MPM: event

   Server Built: 2023-04-02T03:06:01

     ----------------------------------------------------------------------

   Current Time: Thursday, 02-Nov-2023 14:55:52 UTC

   Restart Time: Thursday, 02-Nov-2023 14:54:37 UTC

   Parent Server Config. Generation: 1

   Parent Server MPM Generation: 0

   Server uptime: 1 minute 15 seconds

   Server load: 3.50 3.91 4.19

   Total accesses: 31 - Total Traffic: 561 kB - Total Duration: 1441

   CPU Usage: u.18 s.07 cu0 cs0 - .333% CPU load

   .413 requests/sec - 7.5 kB/second - 18.1 kB/request - 46.4839 ms/request

   1 requests currently being processed, 49 idle workers

   +------------------------------------------------------------------------+
   |    |     |        |Connections    |Threads  |Async connections         |
   |Slot|PID  |Stopping|---------------+---------+--------------------------|
   |    |     |        |total|accepting|busy|idle|writing|keep-alive|closing|
   |----+-----+--------+-----+---------+----+----+-------+----------+-------|
   |0   |80322|no      |0    |yes      |0   |25  |0      |0         |0      |
   |----+-----+--------+-----+---------+----+----+-------+----------+-------|
   |1   |80323|no      |1    |yes      |1   |24  |0      |1         |0      |
   |----+-----+--------+-----+---------+----+----+-------+----------+-------|
   |Sum |2    |0       |1    |         |1   |49  |0      |1         |0      |
   +------------------------------------------------------------------------+

 __________________________________W_______________..............
 ................................................................
 ......................

   Scoreboard Key:
   "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
   "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
   "C" Closing connection, "L" Logging, "G" Gracefully finishing,
   "I" Idle cleanup of worker, "." Open slot with no current process
herron closed this task as Resolved.EditedNov 2 2023, 3:27 PM
herron claimed this task.

After a rolling restart, apache2 cpu utilization is down to essentially 0. Resolving for now, will investigate further if cpu util rises again.