Page MenuHomePhabricator

labvirt1003 raid warning
Closed, ResolvedPublic

Description

WARNING: Slot 0: Predictive Failure: 2I:1:17 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:13, 2I:1:14, 2I:1:15, 2I:1:16, 2I:1:18 - Controller: OK - Battery/Capacitor: OK

I don't really know how to read that but I assume it's telling use we need to replace a drive.

Related Objects

StatusSubtypeAssignedTask
Resolved Cmjohnson

Event Timeline

Cmjohnson added subscribers: RobH, Cmjohnson.

This server is now out of warranty, expired March 2018. We will need to order a new disk. @RobH can you order a 2.5" 300GB disk Model EH0300JDYTH

I see we have a bunch of sprae: Intel 320 Series SSDSA2CW300G3 2.5" 300GB

Can we try putting in an SSD and rebuilding, rather than buying an expensive (but slower)SAS disk?

@Andrew and @RobH I replaced the disk with a SSD. Let me know if it works

Status is listed as "failed" at the moment on the web interface. I'll check if there is anything else to be found.

Yeah, hpssacli says similar. Doesn't seem to like that drive:

physicaldrive 2I:1:17 (port 2I:box 1:bay 17, Solid State SATA, 300.0 GB, Failed)

it's a SATA disk and they have SAS disks. I will look around but I don't think I have a 2.5" spare SAS disk

Yeah, it seems likely we are going to be buying a disk @RobH

Note: The old disk was a predictive failure. @Cmjohnson it could probably be put back for the time being just to prevent a more serious issue before we can get a replacement.

Ok, so we'll need to buy some 300GB SFF SAS disks, correct? I'll create a procurement task and link to this.

now on Icinga

Service - Device not healthy -SMART-
On Host labvirt1003

cluster=labvirt device=cciss,17 instance=labvirt1003:9100 job=node site=eqiad

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=labvirt1003&service=Device+not+healthy+-SMART-

Cmjohnson closed subtask Unknown Object (Task) as Resolved.Aug 28 2018, 2:47 PM

@Bstorm The disk has been swapped. Please resolve this once satisfied

Looks happy now.

Smart Array P420i in Slot 0 (Embedded)

   array B

      Logical Drive: 2
         Size: 2.2 TB
         Fault Tolerance: 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 2048 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001CB70F92E7806618A3FBD3
         Disk Name: /dev/sdb
         Mount Points: /var/lib/nova/instances 2.2 TB Partition Number 2
         OS Status: LOCKED
         Logical Drive Label: A3DEEC80001438033EE3FD0F4CB
         Mirror Group 1:
            physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 300 GB, OK)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 300 GB, OK)
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 300 GB, OK)
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 300 GB, OK)
            physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 300 GB, OK)
            physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 300 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 300 GB, OK)
            physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 300 GB, OK)
            physicaldrive 1I:1:13 (port 1I:box 1:bay 13, SAS, 300 GB, OK)
            physicaldrive 2I:1:14 (port 2I:box 1:bay 14, SAS, 300 GB, OK)
            physicaldrive 2I:1:15 (port 2I:box 1:bay 15, SAS, 300 GB, OK)
            physicaldrive 2I:1:16 (port 2I:box 1:bay 16, SAS, 300 GB, OK)
            physicaldrive 2I:1:17 (port 2I:box 1:bay 17, SAS, 300 GB, OK)
            physicaldrive 2I:1:18 (port 2I:box 1:bay 18, SAS, 300 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache