Page MenuHomePhabricator

ms-be2013 - swift-storage/sdc1 is not accessible: Input/output error
Closed, ResolvedPublic

Description

on ms-be2013

Current Status:

CRITICAL

(for 0d 1h 24m 58s)

DISK CRITICAL - /srv/swift-storage/sdc1 is not accessible: Input/output error

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=ms-be2013&service=Disk+space

Event Timeline

Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn subscribed.
Dzahn renamed this task from ms-be2013 - to ms-be2013 - swift-storage/sdc1 is not accessible: Input/output error.Jul 8 2015, 6:35 PM
Dzahn set Security to None.

also: CRITICAL: Puppet has 1 failures

Warning: /Stage[main]/Role::Swift::Storage/Swift_new::Init_device[/dev/sdc]/Swift_new::Mount_filesystem[/dev/sdc1]/File[mountpoint-/srv/swift-storage/sdc1]: Skipping because of failed dependencies

etc

Okay, did you remove the disk already?

fgiunchedi added subscribers: Papaul, Cmjohnson.

nevermind Chris, I misread the hostname, this machine is several kms away from you :) moving to @Papaul

okay, so the disk at slot2 which would be /dev/sdc is missing altogether
from megacli

the others look good
cmjohnson@ms-be2013:~$ sudo megacli -PDList -aALL |grep "Firmware state"
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

yep I've tried umount but of course it is stuck, I'll reboot the machine

@Papaul, I've located the disk on the controller so it should be blinking once the replacement comes in, thanks!

disk @ slot2 status led's are off. (not green not amber)

thanks @Papaul, I made a brown paperbag mistake and cleared the raid config (not the foreign config) on reboot.
I'll take care of reimagining the machine tomorrow, meanwhile this is what dmesg had to say, the controller bios doesn't seem to see the PD at all tho

[19455817.919316] sd 0:2:2:0: [sdc] CDB: 
[19455817.919317] Read(16): 88 00 00 00 00 00 00 00 00 25 00 00 00 01 00 00
[19455817.919325] end_request: I/O error, dev sdc, sector 37
[19455817.925355] sd 0:2:2:0: [sdc] Unhandled error code
[19455817.925357] sd 0:2:2:0: [sdc]  
[19455817.925359] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[19455817.925362] sd 0:2:2:0: [sdc] CDB: 
[19455817.925363] Read(16): 88 00 00 00 00 00 00 00 00 26 00 00 00 01 00 00
[19455817.925389] end_request: I/O error, dev sdc, sector 38
[19455817.931426] sd 0:2:2:0: [sdc] Unhandled error code
[19455817.931427] sd 0:2:2:0: [sdc]  
[19455817.931428] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[19455817.931430] sd 0:2:2:0: [sdc] CDB: 
[19455817.931431] Read(16): 88 00 00 00 00 00 00 00 00 27 00 00 00 01 00 00
[19455817.931439] end_request: I/O error, dev sdc, sector 39
[19455817.937471] sd 0:2:2:0: [sdc] Unhandled error code
[19455817.937473] sd 0:2:2:0: [sdc]  
[19455817.937475] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[19455817.937478] sd 0:2:2:0: [sdc] CDB: 
[19455817.937479] Read(16): 88 00 00 00 00 00 00 00 00 28 00 00 00 01 00 00
[19455817.937493] end_request: I/O error, dev sdc, sector 40
[19455817.943531] sd 0:2:2:0: [sdc] Unhandled error code
[19455817.943532] sd 0:2:2:0: [sdc]  
[19455817.943533] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[19455817.943535] sd 0:2:2:0: [sdc] CDB: 
[19455817.943536] Read(16): 88 00 00 00 00 00 00 00 00 29 00 00 00 01 00 00
[19455817.943544] end_request: I/O error, dev sdc, sector 41
Dzahn triaged this task as Medium priority.Jul 9 2015, 5:20 PM

@Papaul please go ahead and order replacement

the installation assumes all disks are present to go ahead otherwise the disks presented to the OS don't match the names we're expecting if a disk is missing

Will have the replacement disk on site n Monday.

Disk replacement complete.

machine is replicating the objects to the failed disk