Page MenuHomePhabricator

Handle SMART for multiple shelves and controllers
Open, MediumPublic

Description

labstore1006's controller now has multiple shelves attached to it, though smart-data-dump isn't smart enough (hah!) to handle this case. Also note that there are multiple controllers, and we should support that too.

root@labstore1006:~# smart-data-dump --debug
DEBUG:__main__:Fact 'raid' discovered: ['hpsa']
DEBUG:__main__:Gathering SMART data from physical disks: ['cciss,0', 'cciss,1', 'cciss,2', 'cciss,3', 'cciss,4', 'cciss,5', 'cciss,6', 'cciss,7', 'cciss,8', 'cciss,9', 'cciss,10', 'cciss,11', 'cciss,12', 'cciss,13', 'cciss,14', 'cciss,15', 'cciss,16', 'cciss,17', 'cciss,18', 'cciss,19', 'cciss,20', 'cciss,21', 'cciss,22', 'cciss,23', 'cciss,0', 'cciss,1', 'cciss,2', 'cciss,3', 'cciss,4', 'cciss,5', 'cciss,6', 'cciss,7', 'cciss,8', 'cciss,9', 'cciss,10', 'cciss,11', 'cciss,12', 'cciss,13']

though starting with cciss,14 the smartctl invocation starts to fail and disks are reported as not healthy:

DEBUG:__main__:Running: /usr/bin/timeout 60 /usr/sbin/smartctl --info --health -d cciss,14 /dev/sda
DEBUG:__main__:Running: /usr/bin/timeout 60 /usr/sbin/smartctl --attributes -d cciss,14 /dev/sda
DEBUG:__main__:Running: /usr/bin/timeout 60 /usr/sbin/smartctl --info --health -d cciss,15 /dev/sda
DEBUG:__main__:Running: /usr/bin/timeout 60 /usr/sbin/smartctl --attributes -d cciss,15 /dev/sda
DEBUG:__main__:Running: /usr/bin/timeout 60 /usr/sbin/smartctl --info --health -d cciss,16 /dev/sda
...

Event Timeline

fgiunchedi triaged this task as Medium priority.Jul 10 2018, 3:33 PM
fgiunchedi created this task.
fgiunchedi renamed this task from Handle SMART for multiple shelves attached to a single smartarray controller to Handle SMART for multiple shelves and controllers.Jul 11 2018, 10:23 AM
fgiunchedi updated the task description. (Show Details)