Page MenuHomePhabricator

furud - DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error
Closed, ResolvedPublic

Description

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=furud&service=Disk+space

host furud: DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error

# Backup system, see T176506.
# This is a reserved system. Ask Otto or Faidon.
node 'furud.codfw.wmnet' {
    role(analytics_cluster::hadoop::client)
16:07 <+icinga-wm> PROBLEM - Disk space on furud is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space

16:09 < mutante> ottomata: furud ^ seems out of disk. site.pp says it's a hadoop client and backup and to ask you

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-04-19T22:47:11Z] <mutante> furud - remounted /mnt/hdfs for T221483

Dzahn claimed this task.

followed the docs at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_/mnt/hdfs

but had to repeat commands twice:

[furud:~] $  sudo umount -f /mnt/hdfs
[furud:~] $  sudo fusermount -uz /mnt/hdfs
fusermount: failed to unmount /mnt/hdfs: Invalid argument
[furud:~] $  sudo mount /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid
[furud:~] $  sudo fusermount -uz /mnt/hdfs
[furud:~] $  sudo mount /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid
[furud:~] $

18:46 <+icinga-wm> RECOVERY - Disk space on furud is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space

Thanks! We should probably unpuppetize the Hadoop part of these nodes and unmount /mnt/hdfs until we need them again.