Page MenuHomePhabricator

restbase1006 faulty disk controller
Closed, ResolvedPublic

Description

mdadm kicked the disk out of the array, mpt2sas complains with a recurrent error

[   20.740199] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[  378.060626] Fusion MPT base driver 3.04.20
[  378.060633] Copyright (c) 1999-2008 LSI Corporation
[  378.061620] Fusion MPT misc device (ioctl) driver 3.04.20
[  378.061758] mptctl: Registered with Fusion MPT base driver
[  378.061765] mptctl: /dev/mptctl @ (major,minor=10,220)
[  434.092398] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[ 2371.097391] end_request: I/O error, dev sdc, sector 0
[ 2397.139314] end_request: I/O error, dev sdc, sector 0
[ 2420.413883] end_request: I/O error, dev sdc, sector 0
[ 2427.474924] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[ 2427.794821] md: md0: resync done.
[ 2427.818740] RAID1 conf printout:
[ 2427.818745]  --- wd:3 rd:3
[ 2427.818749]  disk 0, wo:0, o:1, dev:sda1
[ 2427.818752]  disk 1, wo:0, o:1, dev:sdb1
[ 2427.818754]  disk 2, wo:0, o:1, dev:sdc1
[ 2431.163691] end_request: I/O error, dev sdc, sector 0
[13923.724894] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[51161.987273] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[51166.686945] end_request: I/O error, dev sdc, sector 2056
[51166.710585] md: super_written gets error=-5, uptodate=0
[51166.710592] md/raid1:md0: Disk failure on sdc1, disabling device.
md/raid1:md0: Operation continuing on 2 devices.
[51166.770264] RAID1 conf printout:
[51166.770270]  --- wd:2 rd:3
[51166.770274]  disk 0, wo:0, o:1, dev:sda1
[51166.770276]  disk 1, wo:0, o:1, dev:sdb1
[51166.770279]  disk 2, wo:1, o:0, dev:sdc1
[51166.794657] RAID1 conf printout:
[51166.794663]  --- wd:2 rd:3
[51166.794666]  disk 0, wo:0, o:1, dev:sda1
[51166.794669]  disk 1, wo:0, o:1, dev:sdb1
[94407.911741] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[94407.911751] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
Resolved GWicke
Resolvedfgiunchedi
Resolvedfgiunchedi

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi subscribed.

Might be a HW issue, either with the controller or the disk or somewhere in between. Maybe try to rewire the disks differently to see if the issue persists on the same disk? (Naive troubleshooting pays off fairly regularly)

found some info here too
http://serverfault.com/questions/407703/deciphering-continuing-mpt2sas-syslog-messages
and launched a smart long self-test

@Cmjohnson can you try reseating the disks and controller/backplane? machine is not in service yet. we'll stress the disk and controller and see, otherwise we will DOA controller/disk

restbase1006:~$ sudo smartctl -a /dev/sdc
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 850 PRO 1TB
Serial Number:    S2BBNEAG109263A
LU WWN Device Id: 5 002538 85013cb06
Firmware Version: EXM02B6Q
User Capacity:    1,024,209,543,168 bytes [1.02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 16 11:29:15 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
					90% of test remaining.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 543) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       224
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       6
177 Wear_Leveling_Count     0x0013   100   100   000    Pre-fail  Always       -       0
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   077   070   000    Old_age   Always       -       23
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   099   099   000    Old_age   Always       -       274
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       4
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       181499743

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

restbase1006:~$

disk has reappeared as sdd, thus rebooting, however this looks more like the controller :(

[543994.227854] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[543994.228172] mpt2sas0: removing handle(0x000d), sas_addr(0x4433221105000000)
[543994.246068] RAID1 conf printout:
[543994.246074]  --- wd:1 rd:3
[543994.246078]  disk 1, wo:0, o:1, dev:sdb1
[543994.252689] md: unbind<sda1>
[543994.278015] md: export_rdev(sda1)
[543997.513088] scsi 0:0:3:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 6
[543997.513099] scsi 0:0:3:0: SATA: handle(0x000d), sas_addr(0x4433221105000000), phy(5), device_name(0x0000000000000000)
[543997.513103] scsi 0:0:3:0: SATA: enclosure_logical_id(0x500605b009388620), slot(6)
[543997.513180] scsi 0:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[543997.513184] scsi 0:0:3:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
[543997.513983] sd 0:0:3:0: Attached scsi generic sg0 type 0
[543997.514124] sd 0:0:3:0: [sdd] 2000409264 512-byte logical blocks: (1.02 TB/953 GiB)
[543997.515898] sd 0:0:3:0: [sdd] Write Protect is off
[543997.515901] sd 0:0:3:0: [sdd] Mode Sense: 7f 00 10 08
[543997.516170] sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[543997.519416]  sdd: sdd1 sdd2 sdd3
[543997.522662] sd 0:0:3:0: [sdd] Attached SCSI disk
[544070.788235] sd 0:0:1:0: [sdb] Device not ready
[544070.788255] sd 0:0:1:0: [sdb]  
[544070.788260] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[544070.788265] sd 0:0:1:0: [sdb]  
[544070.788268] Sense Key : Not Ready [current] 
[544070.788275] sd 0:0:1:0: [sdb]  
[544070.788280] Add. Sense: Logical unit not ready, cause not reportable
[544070.788285] sd 0:0:1:0: [sdb] CDB: 
[544070.788288] Read(10): 28 00 03 9b e0 08 00 00 08 00
[544070.788300] end_request: I/O error, dev sdb, sector 60547080
[544070.825410] sd 0:0:1:0: [sdb] Device not ready
[544070.825425] sd 0:0:1:0: [sdb]  
[544070.825429] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[544070.825434] sd 0:0:1:0: [sdb]  
[544070.825437] Sense Key : Not Ready [current] 
[544070.825447] sd 0:0:1:0: [sdb]  
[544070.825451] Add. Sense: Logical unit not ready, cause not reportable
[544070.825455] sd 0:0:1:0: [sdb] CDB: 
[544070.825458] Read(10): 28 00 03 9b e0 08 00 00 08 00
[544070.825469] end_request: I/O error, dev sdb, sector 60547080
[544070.860372] sd 0:0:1:0: [sdb] Device not ready
[544070.860387] sd 0:0:1:0: [sdb]  
[544070.860391] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[544070.860396] sd 0:0:1:0: [sdb]  
[544070.860400] Sense Key : Not Ready [current] 
[544070.860408] sd 0:0:1:0: [sdb]  
[544070.860412] Add. Sense: Logical unit not ready, cause not reportable
[544070.860416] sd 0:0:1:0: [sdb] CDB: 
[544070.860419] Read(10): 28 00 03 9b e0 08 00 00 08 00
[544070.860430] end_request: I/O error, dev sdb, sector 60547080
[544070.888831] sd 0:0:1:0: [sdb] Device not ready
[544070.888843] sd 0:0:1:0: [sdb]  
[544070.888847] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[544070.888851] sd 0:0:1:0: [sdb]  
[544070.888854] Sense Key : Not Ready [current] 
[544070.888864] sd 0:0:1:0: [sdb]  
[544070.888868] Add. Sense: Logical unit not ready, cause not reportable
[544070.888872] sd 0:0:1:0: [sdb] CDB: 
[544070.888875] Read(10): 28 00 00 00 08 08 00 00 08 00
[544070.888886] end_request: I/O error, dev sdb, sector 2056
[544070.914934] sd 0:0:1:0: [sdb] Device not ready
[544070.914942] sd 0:0:1:0: [sdb]  
[544070.914945] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[544070.914949] sd 0:0:1:0: [sdb]  
[544070.914952] Sense Key : Not Ready [current] 
[544070.914957] sd 0:0:1:0: [sdb]  
[544070.914960] Add. Sense: Logical unit not ready, cause not reportable
[544070.914965] sd 0:0:1:0: [sdb] CDB: 
[544070.914968] Read(10): 28 00 03 7e 10 08 00 00 08 00
[544070.914977] end_request: I/O error, dev sdb, sector 58593288
[544076.593691] end_request: I/O error, dev sdb, sector 2056
[544076.618568] md: super_written gets error=-5, uptodate=0

same error when just booting up again, let's go with controller DOA, not sure what's the next step for replacement @Cmjohnson

[   81.334942] md: export_rdev(sda1)
[   81.342542] md: bind<sda1>
[   81.347018] RAID1 conf printout:
[   81.347024]  --- wd:1 rd:3
[   81.347028]  disk 0, wo:1, o:1, dev:sda1
[   81.347030]  disk 1, wo:0, o:1, dev:sdb1
[   81.347190] md: recovery of RAID array md0
[   81.347197] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   81.347201] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[   81.347217] md: using 128k window, over a total of 29279232k.
[   84.537227] md: md0: recovery interrupted.
[   84.543949] md: export_rdev(sdc1)
[   84.553195] md: bind<sdc1>
[   84.557819] RAID1 conf printout:
[   84.557825]  --- wd:1 rd:3
[   84.557829]  disk 0, wo:1, o:1, dev:sda1
[   84.557832]  disk 1, wo:0, o:1, dev:sdb1
[   84.557835] RAID1 conf printout:
[   84.557836]  --- wd:1 rd:3
[   84.557838]  disk 0, wo:1, o:1, dev:sda1
[   84.557840]  disk 1, wo:0, o:1, dev:sdb1
[   84.557846] RAID1 conf printout:
[   84.557847]  --- wd:1 rd:3
[   84.557849]  disk 0, wo:1, o:1, dev:sda1
[   84.557851]  disk 1, wo:0, o:1, dev:sdb1
[   84.557853]  disk 2, wo:1, o:1, dev:sdc1
[   84.557975] md: recovery of RAID array md0
[   84.557980] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   84.557983] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[   84.558000] md: using 128k window, over a total of 29279232k.
[   86.206539] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   86.206550] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   86.206557] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   87.131597] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   87.131607] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   87.435398] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   87.435409] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   87.435416] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.477180] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.477190] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.477197] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.722523] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.722534] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   88.722540] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   98.669284] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   98.669295] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   98.669319] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   99.610617] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   99.610644] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   99.864180] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[   99.864190] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
[  100.765685] end_request: I/O error, dev sdc, sector 2056
[  100.790119] md: super_written gets error=-5, uptodate=0
[  100.790125] md/raid1:md0: Disk failure on sdc1, disabling device.
md/raid1:md0: Operation continuing on 1 devices.
[  101.182567] md: md0: recovery interrupted.
[  101.192058] RAID1 conf printout:
[  101.192063]  --- wd:1 rd:3
[  101.192067]  disk 0, wo:1, o:1, dev:sda1
[  101.192069]  disk 1, wo:0, o:1, dev:sdb1
[  101.192072]  disk 2, wo:1, o:0, dev:sdc1
[  101.226651] RAID1 conf printout:
[  101.226656]  --- wd:1 rd:3
[  101.226660]  disk 0, wo:1, o:1, dev:sda1
[  101.226663]  disk 1, wo:0, o:1, dev:sdb1
[  101.226671] RAID1 conf printout:
[  101.226673]  --- wd:1 rd:3
[  101.226675]  disk 0, wo:1, o:1, dev:sda1
[  101.226677]  disk 1, wo:0, o:1, dev:sdb1
[  101.226678] RAID1 conf printout:
[  101.226680]  --- wd:1 rd:3
[  101.226682]  disk 0, wo:1, o:1, dev:sda1
[  101.226684]  disk 1, wo:0, o:1, dev:sdb1
[  101.226815] md: recovery of RAID array md0
[  101.226820] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  101.226824] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  101.226843] md: using 128k window, over a total of 29279232k.
[  101.226846] md: resuming recovery of md0 from checkpoint.
fgiunchedi renamed this task from /dev/sdc offline in restbase1006, recurring mpt2sas message in dmesg to restbase1006 faulty disk controller.Feb 20 2015, 10:12 AM
fgiunchedi set Security to None.

after re-seating the controller, I am no longer seeing the mpt2sas error but do see this
[ 7.256431] md: kicking non-fresh sdc1 from array!
[ 7.279858] md: unbind<sdc1>
[ 7.328188] md: export_rdev(sdc1)

cut/paste below

[ 5.558385] ata4: SATA link down (SStatus 0 SControl 300)
[ 5.592455] ehci-pci 0000:00:1a.0: debug port 2
[ 5.614600] sd 0:0:0:0: [sda] 2000409264 512-byte logical blocks: (1.02 TB/953 GiB)
[ 5.614603] sd 0:0:2:0: [sdc] 2000409264 512-byte logical blocks: (1.02 TB/953 GiB)
[ 5.614606] sd 0:0:1:0: [sdb] 2000409264 512-byte logical blocks: (1.02 TB/953 GiB)
[ 5.616431] sd 0:0:2:0: [sdc] Write Protect is off
[ 5.616452] sd 0:0:1:0: [sdb] Write Protect is off
[ 5.616722] sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 5.616741] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 5.618537] ehci-pci 0000:00:1a.0: irq 21, io mem 0xf6460000
[ 5.620046] sdb: sdb1 sdb2 sdb3
[ 5.620096] sdc: sdc1 sdc2 sdc3
[ 5.623035] sd 0:0:1:0: [sdb] Attached SCSI disk
[ 5.623094] sd 0:0:2:0: [sdc] Attached SCSI disk
[ 5.630452] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[ 5.630534] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[ 5.630536] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 5.630538] usb usb2: Product: EHCI Host Controller
[ 5.630540] usb usb2: Manufacturer: Linux 3.16.0-4-amd64 ehci_hcd
[ 5.630542] usb usb2: SerialNumber: 0000:00:1a.0
[ 5.630753] hub 2-0:1.0: USB hub found
[ 5.630763] hub 2-0:1.0: 2 ports detected
[ 5.631195] ehci-pci 0000:00:1d.0: EHCI Host Controller
[ 5.631207] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 3
[ 5.631232] ehci-pci 0000:00:1d.0: debug port 2
[ 5.635182] ehci-pci 0000:00:1d.0: irq 20, io mem 0xf6450000
[ 5.646480] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[ 5.646547] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[ 5.646548] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 5.646550] usb usb3: Product: EHCI Host Controller
[ 5.646552] usb usb3: Manufacturer: Linux 3.16.0-4-amd64 ehci_hcd
[ 5.646553] usb usb3: SerialNumber: 0000:00:1d.0
[ 5.646740] hub 3-0:1.0: USB hub found
[ 5.646749] hub 3-0:1.0: 2 ports detected
[ 5.878685] ata5: SATA link down (SStatus 0 SControl 300)
[ 5.942745] usb 2-1: new high-speed USB device number 2 using ehci-pci
[ 6.075161] usb 2-1: New USB device found, idVendor=8087, idProduct=0024
[ 6.075162] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 6.075315] hub 2-1:1.0: USB hub found
[ 6.075408] hub 2-1:1.0: 6 ports detected
[ 6.186981] usb 3-1: new high-speed USB device number 2 using ehci-pci
[ 6.199007] ata6: SATA link down (SStatus 0 SControl 300)
[ 6.319406] usb 3-1: New USB device found, idVendor=8087, idProduct=0024
[ 6.319408] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 6.319558] hub 3-1:1.0: USB hub found
[ 6.319655] hub 3-1:1.0: 8 ports detected
[ 6.591431] usb 3-1.3: new high-speed USB device number 3 using ehci-pci
[ 6.683898] usb 3-1.3: New USB device found, idVendor=0424, idProduct=2660
[ 6.683899] usb 3-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 6.684046] hub 3-1.3:1.0: USB hub found
[ 6.684145] hub 3-1.3:1.0: 2 ports detected
[ 6.965246] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 6.968118] sd 0:0:0:0: [sda] Write Protect is off
[ 6.968377] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 6.971263] sda: sda1 sda2 sda3
[ 6.973832] sd 0:0:0:0: [sda] Attached SCSI disk
[ 7.094801] sd 0:0:1:0: Attached scsi generic sg1 type 0
[ 7.120641] sd 0:0:2:0: Attached scsi generic sg2 type 0
[ 7.167629] md: bind<sda2>
[ 7.181577] md: bind<sdc1>
[ 7.195824] md: bind<sda1>
[ 7.210759] md: bind<sda3>
[ 7.225956] md: bind<sdc2>
[ 7.242456] md: bind<sdb1>
[ 7.256431] md: kicking non-fresh sdc1 from array!
[ 7.279858] md: unbind<sdc1>
[ 7.328188] md: export_rdev(sdc1)
[ 7.345997] md: raid1 personality registered for level 1
[ 7.372394] md/raid1:md0: active with 2 out of 3 mirrors
[ 7.398631] md0: detected capacity change from 0 to 29981933568
[ 7.430006] md0: unknown partition table
[ 7.449728] md: bind<sdb3>
[ 7.465894] md: bind<sdc3>
[ 7.481968] md: raid0 personality registered for level 0
[ 7.508145] md/raid0:md2: md_size is 5818798080 sectors.
[ 7.533882] md: RAID0 configuration for md2 - 1 zone
[ 7.557980] md: zone0=[sda3/sdb3/sdc3]
[ 7.576483] zone-offset= 0KB, device-offset= 0KB, size=2909399040KB
[ 7.616507]
[ 7.623865] md2: detected capacity change from 0 to 2979224616960
[ 7.655788] md2: unknown partition table
[ 7.657542] md: bind<sdb2>
[ 7.662032] md/raid1:md1: active with 3 out of 3 mirrors
[ 7.662080] md1: detected capacity change from 0 to 999751680
[ 7.662573] md1: unknown partition table
[ 7.772632] random: nonblocking pool is initialized
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Assembling all MD arrays ... mdadm: Found some drive for an array that is already active: /dev/md/0
mdadm: giving up.
Failure: failed to assemble all arrays.
done.
[ 7.861322] device-mapper: uevent: version 1.0.3
[ 7.883496] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@redhat.com
done.
Begin: Running /scripts/local-premount ... [ 7.936546] PM: Star

still seeing the errors :(

Feb 25 09:37:08 restbase1006 kernel: [50303.994353] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 25 09:37:08 restbase1006 kernel: [50303.994362] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 25 09:37:08 restbase1006 kernel: [50303.994367] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 25 09:37:08 restbase1006 kernel: [50303.994371] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 25 09:37:08 restbase1006 kernel: [50303.994375] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)

let's engage HP on this and see what we can do

coren triaged this task as High priority.Mar 2 2015, 4:13 PM
coren subscribed.

Setting to High as this is blocker to a High priority task.

Due to the weather conditions in our area as well as the HP distribution center in Kentucky, the HP field engineer Edwin Robles will be bringing the new controller on Monday between 9 and 10am.

Replaced the disk controller today.

sending over to filippo to get working

restbase1006 reinstalled and back in service