Page MenuHomePhabricator

smart-data-dump --syslog producing errors and spamming root@
Closed, ResolvedPublic


There are lots of mails sent to root@ coming from different hosts with the following:

/usr/local/sbin/smart-data-dump --syslog --outfile /var/lib/prometheus/node.d/device_smart.prom

Traceback (most recent call last):
  File "/usr/local/sbin/smart-data-dump", line 459, in <module>
  File "/usr/local/sbin/smart-data-dump", line 438, in main
    for pd in handler():
  File "/usr/local/sbin/smart-data-dump", line 182, in hpsa_list_pd
    return hpsa_parse(raw_output, lsscsi_list_dev())
  File "/usr/local/sbin/smart-data-dump", line 227, in lsscsi_list_dev
    return lsscsi_parse(_check_output('/usr/bin/lsscsi -t -g'))
  File "/usr/local/sbin/smart-data-dump", line 243, in lsscsi_parse
    output[m[1]] = m[2]
TypeError: '_sre.SRE_Match' object is not subscriptable

Event Timeline

Marostegui triaged this task as Medium priority.May 12 2020, 6:37 AM
Marostegui moved this task from Backlog to Acknowledged on the SRE board.
colewhite claimed this task.

Thanks for the report!

There was a bug in the updated hpsa parser on initial deployment that fired these emails. It was caught the same day and was fixed in

There are reports from early today from dbprov2001 for instance.

I found the email you are referring to. Logs:

Traceback (most recent call last):
  File "/usr/local/sbin/smart-data-dump", line 459, in <module>
  File "/usr/local/sbin/smart-data-dump", line 429, in main
    raid_drivers = get_fact('raid')
  File "/usr/local/sbin/smart-data-dump", line 134, in get_fact
    facter_version = int(_check_output('/usr/bin/facter --version', stderr=subprocess.DEVNULL)
  File "/usr/local/sbin/smart-data-dump", line 123, in _check_output
    return subprocess.check_output(cmd, stderr=stderr) \
  File "/usr/lib/python3.5/", line 316, in check_output
  File "/usr/lib/python3.5/", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/timeout', '60', '/usr/bin/facter', '--version']' returned non-zero exit status 124

The output from dbprov2001 indicates a different issue manifested as a timeout fetching facter data. Although it's a bit surprising that facter timed out fetching its version.

There is an existing task for facter timeouts: T251293. I will copy these logs to that task as well.

Edit: At the time smart-data-dump ran, the disk was saturated (See 09:12).

jijiki added a project: DBA.
jijiki added a subscriber: jijiki.

Reopened the wrong task, re-closing. Nothing to see here, move along.