00:01:18.014 07:55:51 Waiting for canary traffic... 00:01:38.036 07:56:12 Executing check 'Logstash Error rate for deployment-mediawiki01.deployment-prep.eqiad.wmflabs' 00:01:48.141 07:56:22 Check 'Logstash Error rate for deployment-mediawiki01.deployment-prep.eqiad.wmflabs' failed: Traceback (most recent call last): 00:01:48.141 File "/usr/local/bin/logstash_checker.py", line 268, in <module> 00:01:48.141 main() 00:01:48.141 File "/usr/local/bin/logstash_checker.py", line 264, in main 00:01:48.141 sys.exit(checker.run()) 00:01:48.141 File "/usr/local/bin/logstash_checker.py", line 166, in run 00:01:48.141 body=self._logstash_query() 00:01:48.141 File "/usr/local/bin/logstash_checker.py", line 85, in fetch_url 00:01:48.141 "downloading {}".format(url)) 00:01:48.141 __main__.CheckServiceError: Timeout on connection while downloading deployment-logstash2.deployment-prep.eqiad.wmflabs:9200/logstash-*/_search 00:01:48.141
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | hashar | T143982 scap on beta cluster does not run anymore due to logstash being down | |||
Resolved | • dancy | T144033 handle logstash timeouts separately from spikes in errors reported by logstash |
Event Timeline
Comment Actions
Debian GNU/Linux 8 deployment-logstash2 ttyS0 deployment-logstash2 login: [2348521.020179] INFO: task jbd2/vda3-8:167 blocked for more than 120 seconds. [2348521.027931] Not tainted 3.16.0-4-amd64 #1 [2348521.028429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.029991] INFO: task jbd2/dm-0-8:311 blocked for more than 120 seconds. [2348521.031129] Not tainted 3.16.0-4-amd64 #1 [2348521.031619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.032620] INFO: task nscd:612 blocked for more than 120 seconds. [2348521.033298] Not tainted 3.16.0-4-amd64 #1 [2348521.033788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.035209] INFO: task nscd:613 blocked for more than 120 seconds. [2348521.035865] Not tainted 3.16.0-4-amd64 #1 [2348521.036377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.037296] INFO: task java:9783 blocked for more than 120 seconds. [2348521.037950] Not tainted 3.16.0-4-amd64 #1 [2348521.039307] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.040611] INFO: task java:9823 blocked for more than 120 seconds. [2348521.041290] Not tainted 3.16.0-4-amd64 #1 [2348521.041775] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.043008] INFO: task java:11191 blocked for more than 120 seconds. [2348521.043684] Not tainted 3.16.0-4-amd64 #1 [2348521.044197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.045073] INFO: task java:18437 blocked for more than 120 seconds. [2348521.045743] Not tainted 3.16.0-4-amd64 #1 [2348521.046447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.047419] INFO: task <tcp:18449 blocked for more than 120 seconds. [2348521.048132] Not tainted 3.16.0-4-amd64 #1 [2348521.048614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2348521.049571] INFO: task <syslog:18450 blocked for more than 120 seconds. [2348521.051502] Not tainted 3.16.0-4-amd64 #1 [2348521.051995] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nscd / syslog / tcp / jbd2/vda3 etc are blocked somehow.
Comment Actions
Mentioned in SAL [2016-08-26T08:07:00Z] <hashar> rebooting deployment-logstash02 via Horizon. Kernel hang apparently T143982
Comment Actions
Mentioned in SAL [2016-08-26T08:10:28Z] <hashar> deployment-logstash2 is back after a hard reboot. T143982
Comment Actions
Mentioned in SAL [2016-08-26T08:10:58Z] <hashar> beta-scap-eqiad job is back in operation. Was blocked on logstash not being reachable. T143982