Page MenuHomePhabricator

xfs_db blocked / timeout on ms-be2023
Closed, InvalidPublic

Description

Noticed this yesterday when ms-be2023 got its disk replaced (T184787), ATM we're using xfs_admin / xfs_db to read the filesystem label and decide whether or not to mkfs.

Jan 18 17:34:01 ms-be2023 CRON[21196]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Jan 18 17:34:51 ms-be2023 puppet-agent[13573]: (/Stage[main]/Role::Swift::Storage/Swift::Init_device[/dev/sdn]/Exec[mkfs-/dev/sdn1]/unless) Check "xfs_admin -l /dev/sdn1" exceeded timeout
Jan 18 17:34:59 ms-be2023 puppet-agent[13573]: Applied catalog in 317.03 seconds
Jan 18 17:35:01 ms-be2023 CRON[22776]: (swift) CMD (  test -x /usr/bin/swift-recon-cron && test -r /etc/swift/object-server.conf && /usr/bin/swift-recon-cron /etc/swift/object-server.conf)
Jan 18 17:35:01 ms-be2023 CRON[22777]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jan 18 17:35:01 ms-be2023 CRON[22778]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Jan 18 17:35:42 ms-be2023 kernel: [ 2538.932446] INFO: task xfs_db:14654 blocked for more than 120 seconds.
Jan 18 17:35:42 ms-be2023 kernel: [ 2538.964128]       Not tainted 4.9.0-5-amd64 #1 Debian 4.9.65-3+deb9u2
Jan 18 17:35:42 ms-be2023 kernel: [ 2538.994513] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032753] xfs_db          D    0 14654      1 0x00000000
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032759]  ffff94e9346b8800 0000000000000000 ffff94e7e784e080 ffff94e93f398940
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032762]  ffff94e938fd80c0 ffffbd79e76d3cf0 ffffffff8ac02923 ffff94e935a714c0
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032764]  0000000000000246 ffff94e93f398940 ffff94e935a717a0 ffff94e7e784e080
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032766] Call Trace:
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032774]  [<ffffffff8ac02923>] ? __schedule+0x233/0x6d0
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032776]  [<ffffffff8ac02df2>] ? schedule+0x32/0x80
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032780]  [<ffffffff8a82e68e>] ? wb_wait_for_completion+0x5e/0x90
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032782]  [<ffffffff8a6b86c0>] ? prepare_to_wait_event+0xf0/0xf0
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032784]  [<ffffffff8a8305cf>] ? sync_inodes_sb+0xaf/0x280
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032787]  [<ffffffff8a77dcf4>] ? __filemap_fdatawrite_range+0xd4/0x100
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032789]  [<ffffffff8a8368a5>] ? __sync_filesystem+0x15/0x50
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032791]  [<ffffffff8a83ebbf>] ? fsync_bdev+0x1f/0x50
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032794]  [<ffffffff8a90b1c4>] ? blkdev_ioctl+0x594/0x940
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032796]  [<ffffffff8a83e5d9>] ? block_ioctl+0x39/0x40
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032797]  [<ffffffff8a816c5f>] ? do_vfs_ioctl+0x9f/0x600
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032799]  [<ffffffff8a807faf>] ? SYSC_newfstat+0x2f/0x50
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032800]  [<ffffffff8a817234>] ? SyS_ioctl+0x74/0x80
Jan 18 17:35:42 ms-be2023 kernel: [ 2539.032803]  [<ffffffff8ac0761e>] ? system_call_fast_compare_end+0xc/0xb7

From the stack trace it looks like the ioctl by xfs_admin causes a fsync_bdev. Since all we need really is to check for a FS label we should be asking udev instead (e.g. via lsblk) since all of that is cached and shouldn't bother if the disk or os are busy.