After rolling out T251293, we've noticed an increase in alerting from smart-data-dump runs: https://logstash.wikimedia.org/goto/d786c2b219de372785760d2ecda1b71f
Sep 19 08:11:00 logstash1010 systemd[1]: Started Collect SMART information from all physical disks and report as Prometheus metrics. Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: Command '['/usr/bin/timeout', '60', '/usr/bin/facter', '--puppet', '--json', '-l', 'error', 'raid_mgmt_tools']' returned non-zero exit status 124. Traceback (most recent call last): File "/usr/local/sbin/smart-data-dump", line 124, in _check_output return subprocess.check_output(cmd, stderr=stderr) \ File "/usr/lib/python3.7/subprocess.py", line 395, in check_output **kwargs).stdout File "/usr/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/bin/timeout', '60', '/usr/bin/facter', '--puppet', '--json', '-l', 'error', 'raid_mgmt_tools']' returned non-zero exit status 124. Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: Traceback (most recent call last): Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/local/sbin/smart-data-dump", line 475, in <module> Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: sys.exit(main()) Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/local/sbin/smart-data-dump", line 444, in main Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: raid_drivers = get_fact('raid_mgmt_tools') Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/local/sbin/smart-data-dump", line 137, in get_fact Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: raw_output = _check_output(command, stderr=subprocess.DEVNULL) Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/local/sbin/smart-data-dump", line 124, in _check_output Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: return subprocess.check_output(cmd, stderr=stderr) \ Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/lib/python3.7/subprocess.py", line 395, in check_output Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: **kwargs).stdout Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: File "/usr/lib/python3.7/subprocess.py", line 487, in run Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: output=stdout, stderr=stderr) Sep 19 08:12:35 logstash1010 smart-data-dump[31887]: subprocess.CalledProcessError: Command '['/usr/bin/timeout', '60', '/usr/bin/facter', '--puppet', '--json', '-l', 'error', 'raid_mgmt_tools']' returned non-zero exit status 124. Sep 19 08:12:35 logstash1010 systemd[1]: export_smart_data_dump.service: Main process exited, code=exited, status=1/FAILURE Sep 19 08:12:35 logstash1010 systemd[1]: export_smart_data_dump.service: Failed with result 'exit-code'.