helium, the bacula server, shows up in Icinga with a SMART not healthy alert:
cluster=misc device=megaraid,10 instance=helium:9100 job=node site=eqiad
helium, the bacula server, shows up in Icinga with a SMART not healthy alert:
cluster=misc device=megaraid,10 instance=helium:9100 job=node site=eqiad
Thanks Chris!
The Icinga alert is green again: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=helium&service=Device+not+healthy+-SMART-
also T206004
unfortunately it still says the RAID is "partially degraded" while the Icinga SMART alert has recovered.
@Dzahn the disk was replaced but it's unconfigured good ....I have not tried to add it back but no success. can you give it a go please
I don't know how to do that. How did you try it? Are there maybe docs or examples how that is usually done?
I followed http://erikimh.com/megacli-cheatsheet/ to do so
and
megacli -PdReplaceMissing -PhysDrv [15:9] -Array0 -row9 -a0 Adapter: 0: Failed to replace Missing PD at Array 0, Row 9. FW error description: The specified physical drive does not have the appropriate attributes to complete the requested command. Exit Code: 0x26
Which had me wondering what on earth and then I found https://www.thomas-krenn.com/de/wiki/MegaCLI_Error_Messages which says
0x26 Unable to use SATA(SAS) drive to replace SAS(SATA)
and sure enough
megacli -PDList -aALL | grep 'PD Type' PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SAS PD Type: SATA PD Type: SAS PD Type: SAS
@Cmjohnson where did that disk come from ?
The disk was a spare...i didn't even look to see that it was a SATA disk.
This server is out of warranty and we'll need to buy 4TB SAS disks
That's what we 've being down up to now more or less. But it doesn't look good either timewise. See T203827 (I 'll add it as a blocker on T196478)
And now we got
sudo /usr/local/lib/nagios/plugins/check_raid OK: optimal, 1 logical, 12 physical OK
Great. Thanks @Cmjohnson
CRITICAL (for 9d 15h 51m 18s)cluster=misc device=megaraid,14 instance=helium:9100 job=node site=eqiad