Page MenuHomePhabricator

furud - DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error
Closed, ResolvedPublic

Description

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=furud&service=Disk+space

host furud: DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error

# Backup system, see T176506.
# This is a reserved system. Ask Otto or Faidon.
node 'furud.codfw.wmnet' {
    role(analytics_cluster::hadoop::client)
16:07 <+icinga-wm> PROBLEM - Disk space on furud is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space

16:09 < mutante> ottomata: furud ^ seems out of disk. site.pp says it's a hadoop client and backup and to ask you

Event Timeline

Dzahn created this task.Apr 19 2019, 10:44 PM
Restricted Application added a project: Analytics. · View Herald TranscriptApr 19 2019, 10:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-04-19T22:47:11Z] <mutante> furud - remounted /mnt/hdfs for T221483

Dzahn closed this task as Resolved.Apr 19 2019, 10:47 PM
Dzahn claimed this task.

followed the docs at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_/mnt/hdfs

but had to repeat commands twice:

[furud:~] $  sudo umount -f /mnt/hdfs
[furud:~] $  sudo fusermount -uz /mnt/hdfs
fusermount: failed to unmount /mnt/hdfs: Invalid argument
[furud:~] $  sudo mount /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid
[furud:~] $  sudo fusermount -uz /mnt/hdfs
[furud:~] $  sudo mount /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
INFO /data/jenkins/workspace/generic-package-debian64-8-0/CDH5.16.1-Packaging-Hadoop-2018-11-21_20-46-52/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid
[furud:~] $

18:46 <+icinga-wm> RECOVERY - Disk space on furud is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space

Peachey88 updated the task description. (Show Details)Apr 20 2019, 12:43 AM
Ottomata claimed this task.Apr 23 2019, 2:16 PM

Thanks! We should probably unpuppetize the Hadoop part of these nodes and unmount /mnt/hdfs until we need them again.

Ottomata reassigned this task from Ottomata to Dzahn.Apr 23 2019, 2:16 PM