Page MenuHomePhabricator

I/O issues for /dev/sdd on analytics1047.eqiad.wmnet
Closed, ResolvedPublic8 Estimated Story Points

Description

dmesg on analytics1047.eqiad.wmnet:

[3899221.881936] sd 0:2:3:0: [sdd]
[3899221.881944] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[3899221.881945] sd 0:2:3:0: [sdd]
[3899221.881946] Sense Key : Medium Error [current]
[3899221.881949] sd 0:2:3:0: [sdd]
[3899221.881950] Add. Sense: No additional sense information
[3899221.881951] sd 0:2:3:0: [sdd] CDB:
[3899221.881952] Read(16): 88 00 00 00 00 00 e2 80 08 08 00 00 00 08 00 00
[3899221.881958] end_request: I/O error, dev sdd, sector 3800041480
[3899221.888699] EXT4-fs error (device sdd1): ext4_wait_block_bitmap:476: comm kworker/u289:2: Cannot read block bitmap - block_group = 14497, block_bitmap = 475004929
[3899221.905274] EXT4-fs (sdd1): Delayed block allocation failed for inode 193462982 at logical offset 153 with max blocks 18 with error 5
[3899221.918908] EXT4-fs (sdd1): This should not happen!! Data will be lost

Trying to run fsck on /dev/sdd1 but I am not feeling too confident that it will work. Stopped Hadoop services and scheduled downtime for the host.

Would it be possible to check/swap the disk?

Thanks!

Luca

Event Timeline

Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript

@Cmjohnson: Hi Chris, any news about when the disk could be replaced? Thanks a lot! (Sorry for the ping)

@elukey sorry I haven't looked into this yet....I did request a new disk from Dell

Congratulations: Work Order SR929923300 was successfully submitted.

@elukey @Ottomata I have the disk on-site let me know when you're available to coordinate the replacement. I am 99% cert it's slot 5.

Swapped the disk in slot 2 which was determined by myself and ottomata to be the location of /dev/sdd.

Return part shipping information

USPS
9202 3946 5301 2432 0845 53
FEDEX
9611918 2393026 53576230

megacli -CfgForeign -Scan -a0

There are 1 foreign configuration(s) on controller 0.

@elukey this is what I have for notes on how to get the disk back online. Be sure to change the disk information

megacli -CfgForeign -Clear -a0
  Foreign configuration 0 is cleared on controller 0.

megacli -PDMakeJBOD -PhysDrv\[32:2\] -a0

Then format /dev/sd? as linux raid and added it to the array via:

mdadm --manage /dev/md2 --add /dev/sd?

I checked the commands that @Cmjohnson provided and executed ony the -PDMakeJBOD one since the previous two were a no-op (No foreign config found). Also double checked that Id 32 : slot 2 was correct:

sudo megacli -PDList -aALL | grep Unconfigured -B 30 -A 40

Now during boot I can see an error stating "something terribly wrong happened while mounting /var/lib/hadoop/data/b"

That seems to be due to:

elukey@analytics1047:~$ cat /proc/mounts  | grep sdb
/dev/sdb1 /boot ext4 rw,relatime,data=ordered 0 0
/dev/sdb1 /var/lib/hadoop/data/b ext4 rw,noatime,data=ordered 0 0

Analytics config:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Worker_Nodes_.28DataNode_.26_NodeManager.29

I am probably a bit tired now to figure out what the problem is exactly, but I've disabled puppet and Yarn/HDFS for the moment. Will restart the work tomorrow EU time.

I was able to partition the new disk with ext4, but it has appeared under /dev/sda rather than /dev/sdd.

Quick recap about the analytics config from https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Worker_Nodes_.28DataNode_.26_NodeManager.29:

12 disk, 2 flex bay drives (analytics1028-analytics1059):

These nodes come with 2 x 2.5" drives on which the OS and JournalNode partitions are installed. 
This leaves all of the space on the 12 4TB HDDs for DataNode use.

I think that JBOD is not the right config, but probably a Virtual Drive? (single disk RAID 0 seems a trick used from what I've read). Compared various analytics hosts:

elukey@analytics1047:~$ sudo megacli -PDList -aAll | egrep "Enclosure Device ID:|Slot Number:|Firmware state"

Enclosure Device ID: 32
Slot Number: 0
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 1
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 2
Firmware state: JBOD         <==================

Enclosure Device ID: 32
Slot Number: 3
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 4
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 5
Firmware state: Online, Spun Up

[...] 

Enclosure Device ID: 32
Slot Number: 13
Firmware state: Online, Spun Up
elukey@analytics1046:~$ sudo megacli -PDList -aAll | egrep "Enclosure Device ID:|Slot Number:|Firmware state"

Enclosure Device ID: 32
Slot Number: 0
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 1
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 2
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 3
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 4
Firmware state: Online, Spun Up

Enclosure Device ID: 32
Slot Number: 5
Firmware state: Online, Spun Up

[..]

Enclosure Device ID: 32
Slot Number: 13
Firmware state: Online, Spun Up

Yes single disk raid0 virtual drive seems to be the way:

elukey@analytics1047:~$ sudo megacli -LDInfo -L2 -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 2 (Target Id: 2)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 3.637 TB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 0
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No



Exit Code: 0x00

Tried to run the following and found that analytics1047 is missing the Virtual Drive 3 (compared also with analytics1046).

sudo megacli -LDInfo -LAll -a0

No idea about how to remove the JBOD config and add the Virtual Drive one.

Fixed the issue with:

sudo megacli -PDMakeGood -PhysDrv '[32:2]' -Force -a0
sudo megacli -CfgLdAdd -r0 [32:2] -a0

After the reboot the /dev/sdd disk was not configured as expected! Then I just did:

sudo parted /dev/sdd --script mklabel gpt
sudo parted /dev/sdd --script mkpart primary ext4 0% 100%
sudo mkfs.ext4 /dev/sdd1
sudo tune2fs -m 0 /dev/sda1

All good from megacli state:

elukey@analytics1047:/var/log/hadoop-hdfs$ sudo megacli -PDList -aAll | egrep "Enclosure Device ID:|Slot Number:|Firmware state"
Enclosure Device ID: 32
Slot Number: 0
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 1
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 2
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 3
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 4
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 5
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 6
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 7
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 8
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 9
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 10
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 11
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 12
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 13
Firmware state: Online, Spun Up

And mountpoints look good too:

elukey@analytics1047:/var/log/hadoop-hdfs$ df -h
Filesystem                                 Size  Used Avail Use% Mounted on
udev                                        32G  4.0K   32G   1% /dev
tmpfs                                      6.3G  1.2M  6.3G   1% /run
/dev/mapper/analytics1047--vg-root          28G  6.2G   20G  24% /
none                                       4.0K     0  4.0K   0% /sys/fs/cgroup
none                                       5.0M     0  5.0M   0% /run/lock
none                                        32G     0   32G   0% /run/shm
none                                       100M     0  100M   0% /run/user
/dev/mapper/analytics1047--vg-journalnode  9.8G   23M  9.7G   1% /var/lib/hadoop/journal
/dev/sda1                                  268M  124M  127M  50% /boot
/dev/sdb1                                  3.6T  2.1T  1.5T  59% /var/lib/hadoop/data/b
/dev/sdk1                                  3.6T  2.1T  1.6T  57% /var/lib/hadoop/data/k
/dev/sdl1                                  3.6T  2.1T  1.6T  57% /var/lib/hadoop/data/l
/dev/sdm1                                  3.6T  2.1T  1.6T  58% /var/lib/hadoop/data/m
/dev/sdc1                                  3.6T  2.1T  1.6T  58% /var/lib/hadoop/data/c
/dev/sdd1                                  3.6T  178M  3.6T   1% /var/lib/hadoop/data/d
/dev/sde1                                  3.6T  2.1T  1.5T  59% /var/lib/hadoop/data/e
/dev/sdf1                                  3.6T  2.1T  1.6T  57% /var/lib/hadoop/data/f
/dev/sdg1                                  3.6T  2.1T  1.5T  59% /var/lib/hadoop/data/g
/dev/sdh1                                  3.6T  2.1T  1.6T  58% /var/lib/hadoop/data/h
/dev/sdi1                                  3.6T  2.1T  1.6T  57% /var/lib/hadoop/data/i
/dev/sdj1                                  3.6T  2.1T  1.6T  58% /var/lib/hadoop/data/j

Note for future readers: I had to create the "/var/lib/hadoop/data/hdfs" and "/var/lib/hadoop/yarn" directories to avoid failures from the hadoop namenode daemon.

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.
elukey added a subscriber: Cmjohnson.
elukey set the point value for this task to 8.May 26 2016, 3:49 PM