Page MenuHomePhabricator

Host analytics1073 is DOWN
Closed, ResolvedPublic

Description

I was unable to login to the host via ssh and from the console, kernel messages on the console

[20647450.915277] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [kworker/u98:3:31950]
[20647450.924499] Modules linked in: cpuid binfmt_misc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_NFLOG xt_limit xt_tcpudp xt_pkttype nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt ttm iTCO_vendor_support mxm_wmi dcdbas drm_kms_helper kvm drm irqbypass i2c_algo_bit crct10dif_pclmul crc32_pclmul sg ghash_clmulni_intel mei_me lpc_ich pcspkr evdev mfd_core mei shpchp ipmi_si wmi button ipmi_devintf ipmi_msghandler nf_conntrack ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache dm_mod sd_mod ahci libahci ehci_pci aesni_intel ehci_hcd aes_x86_64 glue_helper lrw gf128mul bnx2x ablk_helper cryptd mdio libcrc32c tg3 libata crc32c_generic usbcore megaraid_sas ptp crc32c_intel usb_common pps_core libphy scsi_mod
[20647451.010391] CPU: 5 PID: 31950 Comm: kworker/u98:3 Tainted: G      D W    L  4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u2
[20647451.022424] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[20647451.031263] Workqueue: writeback wb_workfn (flush-8:96)
[20647451.037392] task: ffff94288c06e0c0 task.stack: ffffbcdb67e84000
[20647451.044286] RIP: 0010:[<ffffffff908c5b2f>]  [<ffffffff908c5b2f>] native_queued_spin_lock_slowpath+0x5f/0x1a0
[20647451.055556] RSP: 0018:ffffbcdb67e87bc8  EFLAGS: 00000202
[20647451.061772] RAX: 0000000000000101 RBX: ffff941b31a105a8 RCX: ffff941c08ae81e8
[20647451.070025] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff941c08ae8270
[20647451.078278] RBP: ffff941b33a1d000 R08: ffff941b92cfc648 R09: ffff941fadb02810
[20647451.086531] R10: ffff941fadb02810 R11: 0000000000000008 R12: ffff941c08ae82d8
[20647451.094783] R13: 0000000000000000 R14: ffffbcdb67e87db0 R15: ffff941b31a10580
[20647451.103035] FS:  0000000000000000(0000) GS:ffff942b3f080000(0000) knlGS:0000000000000000
[20647451.112354] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20647451.119055] CR2: 00007f0ef2bec000 CR3: 000000152a408000 CR4: 0000000000360670
[20647451.127307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20647451.135558] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20647451.143809] Stack:
[20647451.146340]  ffffffff90e1a79d ffffffff90a3b94e ffff941b31a105d8 ffff94288c06e0c0
[20647451.154923]  ffff941b31a105b8 0000000233a97319 ffff941c08ae8270 ffff941c08ae81e8
[20647451.163503]  ffff941b92cfc558 0000000000000000 0000000000000000 0000000000000000
[20647451.172086] Call Trace:
[20647451.175107]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20647451.181712]  [<ffffffff90a3b94e>] ? writeback_sb_inodes+0x13e/0x4f0
[20647451.188996]  [<ffffffff90a3bd87>] ? __writeback_inodes_wb+0x87/0xb0
[20647451.196281]  [<ffffffff90a3c0fe>] ? wb_writeback+0x27e/0x310
[20647451.202890]  [<ffffffff90a2798c>] ? get_nr_inodes+0x3c/0x60
[20647451.209397]  [<ffffffff90a3ca64>] ? wb_workfn+0x2b4/0x380
[20647451.215713]  [<ffffffff9089460a>] ? process_one_work+0x18a/0x430
[20647451.222706]  [<ffffffff908948fd>] ? worker_thread+0x4d/0x490
[20647451.229309]  [<ffffffff908948b0>] ? process_one_work+0x430/0x430
[20647451.236302]  [<ffffffff9089a969>] ? kthread+0xd9/0xf0
[20647451.242229]  [<ffffffff90e1a9a4>] ? __switch_to_asm+0x34/0x70
[20647451.248930]  [<ffffffff9089a890>] ? kthread_park+0x60/0x60
[20647451.255342]  [<ffffffff90e1aa37>] ? ret_from_fork+0x57/0x70
[20647451.261847] Code: 75 42 f0 0f ba 2f 08 b8 00 01 00 00 0f 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 75 1b 85 f6 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 c3 81 e6 00 ff 00 00 75 04

[20647459.139205] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [DiskHealthMonit:27744]
[20647459.148720] Modules linked in: cpuid binfmt_misc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_NFLOG xt_limit xt_tcpudp xt_pkttype nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt ttm iTCO_vendor_support mxm_wmi dcdbas drm_kms_helper kvm drm irqbypass i2c_algo_bit crct10dif_pclmul crc32_pclmul sg ghash_clmulni_intel mei_me lpc_ich pcspkr evdev mfd_core mei shpchp ipmi_si wmi button ipmi_devintf ipmi_msghandler nf_conntrack ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache dm_mod sd_mod ahci libahci ehci_pci aesni_intel ehci_hcd aes_x86_64 glue_helper lrw gf128mul bnx2x ablk_helper cryptd mdio libcrc32c tg3 libata crc32c_generic usbcore megaraid_sas ptp crc32c_intel usb_common pps_core libphy scsi_mod
[20647459.234565] CPU: 34 PID: 27744 Comm: DiskHealthMonit Tainted: G      D W    L  4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u2
[20647459.246891] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[20647459.255724] task: ffff940ccb71c0c0 task.stack: ffffbcdb4fc30000
[20647459.262620] RIP: 0010:[<ffffffff908c5bda>]  [<ffffffff908c5bda>] native_queued_spin_lock_slowpath+0x10a/0x1a0
[20647459.273986] RSP: 0018:ffffbcdb4fc33b18  EFLAGS: 00000246
[20647459.280203] RAX: 0000000000000000 RBX: ffff941b31a10580 RCX: 0000000000000000
[20647459.288456] RDX: ffff941b3fc59480 RSI: 00000000008c0000 RDI: ffff941b31a105d8
[20647459.296708] RBP: ffff940f51a39658 R08: 0000000000000000 R09: 000000000000007c
[20647459.304961] R10: ffff942b38219000 R11: ffffeb71397193c0 R12: ffff940f51a396e0
[20647459.313214] R13: ffffffff91202e90 R14: ffff941b31a105d8 R15: 0000000000000000
[20647459.321467] FS:  00007fb2cbdd0700(0000) GS:ffff941b3fc40000(0000) knlGS:0000000000000000
[20647459.330785] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20647459.337487] CR2: 00007fb3085c11f8 CR3: 00000019c2736000 CR4: 0000000000360670
[20647459.345740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20647459.353993] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20647459.362245] Stack:
[20647459.364777]  ffffffff90e1a79d ffffffff90a3ab3a ffffbcdb4fc33cb8 0000000000000000
[20647459.373357]  0000000000000003 ffff940f51a39658 ffff940f51a396e0 0000000000000000
[20647459.381936]  ffffffff90a3b10d ffff942b33bbd800 ffff940f51a39658 ffffbcdb4fc33c94
[20647459.390516] Call Trace:
[20647459.393536]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20647459.400142]  [<ffffffff90a3ab3a>] ? locked_inode_to_wb_and_lock_list+0x5a/0x180
[20647459.408589]  [<ffffffff90a3b10d>] ? __mark_inode_dirty+0x24d/0x360
[20647459.415790]  [<ffffffffc0466b48>] ? ext4_mb_new_blocks+0xe8/0xaf0 [ext4]
[20647459.423567]  [<ffffffffc04570c4>] ? ext4_find_extent+0x264/0x310 [ext4]
[20647459.431246]  [<ffffffffc045bca2>] ? ext4_ext_map_blocks+0xb82/0x1320 [ext4]
[20647459.439313]  [<ffffffffc045ed05>] ? __ext4_handle_dirty_metadata+0x45/0x1c0 [ext4]
[20647459.448057]  [<ffffffffc0429964>] ? ext4_map_blocks+0x164/0x5d0 [ext4]
[20647459.455637]  [<ffffffffc042a860>] ? ext4_getblk+0x50/0x190 [ext4]
[20647459.462733]  [<ffffffffc042a9bf>] ? ext4_bread+0x1f/0xb0 [ext4]
[20647459.469635]  [<ffffffffc04344e8>] ? ext4_append+0x48/0xd0 [ext4]
[20647459.476634]  [<ffffffffc0439956>] ? ext4_mkdir+0x276/0x470 [ext4]
[20647459.483725]  [<ffffffff90a18e0c>] ? vfs_mkdir+0x10c/0x1a0
[20647459.490037]  [<ffffffff90a1d7fe>] ? SyS_mkdir+0xce/0x120
[20647459.496255]  [<ffffffff90803b7d>] ? do_syscall_64+0x8d/0x100
[20647459.502860]  [<ffffffff90e1a88e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[20647459.510918] Code: c9 74 41 c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 04 48 63 c9 48 05 80 94 01 00 48 03 04 cd e0 f3 26 91 48 89 10 8b 42 08 85 c0 75 09 <f3> 90 8b 42 08 85 c0 74 f7 4c 8b 02 4d 85 c0 74 08 41 0f 0d 08

[20647466.787140] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [DataXceiver for:7272]
[20647466.796477] Modules linked in: cpuid binfmt_misc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_NFLOG xt_limit xt_tcpudp xt_pkttype nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt ttm iTCO_vendor_support mxm_wmi dcdbas drm_kms_helper kvm drm irqbypass i2c_algo_bit crct10dif_pclmul crc32_pclmul sg ghash_clmulni_intel mei_me lpc_ich pcspkr evdev mfd_core mei shpchp ipmi_si wmi button ipmi_devintf ipmi_msghandler nf_conntrack ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache dm_mod sd_mod ahci libahci ehci_pci aesni_intel ehci_hcd aes_x86_64 glue_helper lrw gf128mul bnx2x ablk_helper cryptd mdio libcrc32c tg3 libata crc32c_generic usbcore megaraid_sas ptp crc32c_intel usb_common pps_core libphy scsi_mod
[20647466.882342] CPU: 0 PID: 7272 Comm: DataXceiver for Tainted: G      D W    L  4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u2
[20647466.894473] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[20647466.903307] task: ffff9414d4e7c100 task.stack: ffffbcdb6286c000
[20647466.910203] RIP: 0010:[<ffffffff908c5bf5>]  [<ffffffff908c5bf5>] native_queued_spin_lock_slowpath+0x125/0x1a0
[20647466.921563] RSP: 0018:ffffbcdb6286fc98  EFLAGS: 00000202
[20647466.927779] RAX: 0000000000000000 RBX: ffff941b31a10580 RCX: 00000000008c0101
[20647466.936032] RDX: ffff941b3f819480 RSI: 0000000000040000 RDI: ffff941b31a105d8
[20647466.944284] RBP: ffff94161fbeb518 R08: 0000000000000000 R09: 000000000000007c
[20647466.952536] R10: ffff942b38219000 R11: ffff941b3326ec38 R12: ffff94161fbeb5a0
[20647466.960788] R13: ffffffff91202e90 R14: ffff941b31a105d8 R15: 0000000000000000
[20647466.969041] FS:  00007f81c032e700(0000) GS:ffff941b3f800000(0000) knlGS:0000000000000000
[20647466.978359] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20647466.985060] CR2: 00007f81c0318db8 CR3: 00000002a471a000 CR4: 0000000000360670
[20647466.993313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20647467.001564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20647467.009816] Stack:
[20647467.012347]  ffffffff90e1a79d ffffffff90a3ab3a ffff94161fbeb518 0000000000000000
[20647467.020928]  0000000000000001 ffff94161fbeb518 ffff94161fbeb5a0 0000000000000000
[20647467.029509]  ffffffff90a3b10d ffff94161fbeb518 ffffbcdb6286fd40 0000000000000000
[20647467.038090] Call Trace:
[20647467.041110]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20647467.047719]  [<ffffffff90a3ab3a>] ? locked_inode_to_wb_and_lock_list+0x5a/0x180
[20647467.056167]  [<ffffffff90a3b10d>] ? __mark_inode_dirty+0x24d/0x360
[20647467.063355]  [<ffffffff90a284d9>] ? generic_update_time+0x79/0xd0
[20647467.070443]  [<ffffffff90a286e6>] ? current_time+0x36/0x70
[20647467.076854]  [<ffffffff90a287df>] ? file_update_time+0xbf/0x110
[20647467.083750]  [<ffffffff90984ef9>] ? __generic_file_write_iter+0x99/0x1b0
[20647467.091540]  [<ffffffffc0423200>] ? ext4_file_write_iter+0x90/0x380 [ext4]
[20647467.099495]  [<ffffffff90cf3a07>] ? sock_write_iter+0x87/0x100
[20647467.106295]  [<ffffffff90a0b450>] ? new_sync_write+0xe0/0x130
[20647467.111134] NMI watchdog: BUG: soft lockup - CPU#30 stuck for 23s! [DataXceiver for:7241]
[20647467.111173] Modules linked in: cpuid binfmt_misc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_NFLOG xt_limit xt_tcpudp xt_pkttype nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt ttm iTCO_vendor_support mxm_wmi dcdbas drm_kms_helper kvm drm irqbypass i2c_algo_bit crct10dif_pclmul crc32_pclmul sg ghash_clmulni_intel mei_me lpc_ich pcspkr evdev mfd_core mei shpchp ipmi_si wmi button ipmi_devintf ipmi_msghandler nf_conntrack ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache dm_mod sd_mod ahci libahci ehci_pci aesni_intel ehci_hcd aes_x86_64 glue_helper lrw gf128mul bnx2x ablk_helper cryptd mdio libcrc32c tg3 libata crc32c_generic usbcore megaraid_sas
[20647467.111176]  ptp crc32c_intel usb_common pps_core libphy scsi_mod
[20647467.111177] CPU: 30 PID: 7241 Comm: DataXceiver for Tainted: G      D W    L  4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u2
[20647467.111178] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[20647467.111179] task: ffff94171f8fb140 task.stack: ffffbcdb6076c000
[20647467.111182] RIP: 0010:[<ffffffff908c5b2b>]  [<ffffffff908c5b2b>] native_queued_spin_lock_slowpath+0x5b/0x1a0
[20647467.111183] RSP: 0018:ffffbcdb6076fc98  EFLAGS: 00000202
[20647467.111183] RAX: 00000000008c0101 RBX: ffff941b31a10580 RCX: 0000000000000000
[20647467.111184] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff941b31a105d8
[20647467.111185] RBP: ffff9416ae0faa08 R08: 0000000000000000 R09: 000000000000007c
[20647467.111185] R10: ffff942b38219000 R11: ffff941b33bdcb38 R12: ffff9416ae0faa90
[20647467.111186] R13: ffffffff91202e90 R14: ffff941b31a105d8 R15: 0000000000000000
[20647467.111187] FS:  00007f81c204b700(0000) GS:ffff941b3fbc0000(0000) knlGS:0000000000000000
[20647467.111188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20647467.111189] CR2: 00007f81c2035da8 CR3: 00000002a471a000 CR4: 0000000000360670
[20647467.111190] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20647467.111191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20647467.111191] Stack:
[20647467.111193]  ffffffff90e1a79d ffffffff90a3ab3a ffff9416ae0faa08 0000000000000000
[20647467.111195]  0000000000000001 ffff9416ae0faa08 ffff9416ae0faa90 0000000000000000
[20647467.111197]  ffffffff90a3b10d ffff9416ae0faa08 ffffbcdb6076fd40 0000000000000000
[20647467.111197] Call Trace:
[20647467.111200]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20647467.111202]  [<ffffffff90a3ab3a>] ? locked_inode_to_wb_and_lock_list+0x5a/0x180
[20647467.111204]  [<ffffffff90a3b10d>] ? __mark_inode_dirty+0x24d/0x360
[20647467.111205]  [<ffffffff90a284d9>] ? generic_update_time+0x79/0xd0
[20647467.111207]  [<ffffffff90a286e6>] ? current_time+0x36/0x70
[20647467.111208]  [<ffffffff90a287df>] ? file_update_time+0xbf/0x110
[20647467.111210]  [<ffffffff90984ef9>] ? __generic_file_write_iter+0x99/0x1b0
[20647467.111220]  [<ffffffffc0423200>] ? ext4_file_write_iter+0x90/0x380 [ext4]
[20647467.111222]  [<ffffffff90cf3a07>] ? sock_write_iter+0x87/0x100
[20647467.111225]  [<ffffffff90a0b450>] ? new_sync_write+0xe0/0x130
[20647467.111227]  [<ffffffff90a0bc40>] ? vfs_write+0xb0/0x190
[20647467.111229]  [<ffffffff90a0d082>] ? SyS_write+0x52/0xc0
[20647467.111231]  [<ffffffff90803b7d>] ? do_syscall_64+0x8d/0x100
[20647467.111233]  [<ffffffff90e1a88e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[20647467.111254] Code: 30 f6 85 f6 75 42 f0 0f ba 2f 08 b8 00 01 00 00 0f 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 75 1b 85 f6 74 0e 8b 07 84 c0 74 08 <f3> 90 8b 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 81 e6 00 ff
[20647467.488202]  [<ffffffff90a0bc40>] ? vfs_write+0xb0/0x190
[20647467.494419]  [<ffffffff90a0d082>] ? SyS_write+0x52/0xc0
[20647467.500538]  [<ffffffff90803b7d>] ? do_syscall_64+0x8d/0x100
[20647467.507143]  [<ffffffff90e1a88e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[20647467.515209] Code: 04 cd e0 f3 26 91 48 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 4c 8b 02 4d 85 c0 74 08 41 0f 0d 08 eb 02 f3 90 8b 0f <66> 85 c9 75 f7 89 c8 66 31 c0 39 c6 74 50 4d 85 c0 c6 07 01 74

<snip>
[20648243.132271] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [DiskHealthMonit:27744]
[20648243.141787] Modules linked in: cpuid binfmt_misc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_NFLOG xt_limit xt_tcpudp xt_pkttype nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt ttm iTCO_vendor_support mxm_wmi dcdbas drm_kms_helper kvm drm irqbypass i2c_algo_bit crct10dif_pclmul crc32_pclmul sg ghash_clmulni_intel mei_me lpc_ich pcspkr evdev mfd_core mei shpchp ipmi_si wmi button ipmi_devintf ipmi_msghandler nf_conntrack ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache dm_mod sd_mod ahci libahci ehci_pci aesni_intel ehci_hcd aes_x86_64 glue_helper lrw gf128mul bnx2x ablk_helper cryptd mdio libcrc32c tg3 libata crc32c_generic usbcore megaraid_sas ptp crc32c_intel usb_common pps_core libphy scsi_mod
[20648243.227654] CPU: 34 PID: 27744 Comm: DiskHealthMonit Tainted: G      D W    L  4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u2
[20648243.239979] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[20648243.248813] task: ffff940ccb71c0c0 task.stack: ffffbcdb4fc30000
[20648243.255708] RIP: 0010:[<ffffffff908c5bdc>]  [<ffffffff908c5bdc>] native_queued_spin_lock_slowpath+0x10c/0x1a0
[20648243.267076] RSP: 0018:ffffbcdb4fc33b18  EFLAGS: 00000246
[20648243.273292] RAX: 0000000000000000 RBX: ffff941b31a10580 RCX: 0000000000000000
[20648243.281545] RDX: ffff941b3fc59480 RSI: 00000000008c0000 RDI: ffff941b31a105d8
[20648243.289798] RBP: ffff940f51a39658 R08: 0000000000000000 R09: 000000000000007c
[20648243.298051] R10: ffff942b38219000 R11: ffffeb71397193c0 R12: ffff940f51a396e0
[20648243.306303] R13: ffffffff91202e90 R14: ffff941b31a105d8 R15: 0000000000000000
[20648243.314557] FS:  00007fb2cbdd0700(0000) GS:ffff941b3fc40000(0000) knlGS:0000000000000000
[20648243.323875] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20648243.330575] CR2: 00007fb3085c11f8 CR3: 00000019c2736000 CR4: 0000000000360670
[20648243.338826] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20648243.347079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20648243.355331] Stack:
[20648243.357863]  ffffffff90e1a79d ffffffff90a3ab3a ffffbcdb4fc33cb8 0000000000000000
[20648243.366444]  0000000000000003 ffff940f51a39658 ffff940f51a396e0 0000000000000000
[20648243.375025]  ffffffff90a3b10d ffff942b33bbd800 ffff940f51a39658 ffffbcdb4fc33c94
[20648243.383606] Call Trace:
[20648243.386628]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20648243.393227]  [<ffffffff90a3ab3a>] ? locked_inode_to_wb_and_lock_list+0x5a/0x180
[20648243.401675]  [<ffffffff90a3b10d>] ? __mark_inode_dirty+0x24d/0x360
[20648243.408884]  [<ffffffffc0466b48>] ? ext4_mb_new_blocks+0xe8/0xaf0 [ext4]
[20648243.416662]  [<ffffffffc04570c4>] ? ext4_find_extent+0x264/0x310 [ext4]
[20648243.424340]  [<ffffffffc045bca2>] ? ext4_ext_map_blocks+0xb82/0x1320 [ext4]
[20648243.432406]  [<ffffffffc045ed05>] ? __ext4_handle_dirty_metadata+0x45/0x1c0 [ext4]
[20648243.441149]  [<ffffffffc0429964>] ? ext4_map_blocks+0x164/0x5d0 [ext4]
[20648243.448728]  [<ffffffffc042a860>] ? ext4_getblk+0x50/0x190 [ext4]
[20648243.455824]  [<ffffffffc042a9bf>] ? ext4_bread+0x1f/0xb0 [ext4]
[20648243.462726]  [<ffffffffc04344e8>] ? ext4_append+0x48/0xd0 [ext4]
[20648243.469726]  [<ffffffffc0439956>] ? ext4_mkdir+0x276/0x470 [ext4]
[20648243.476815]  [<ffffffff90a18e0c>] ? vfs_mkdir+0x10c/0x1a0
[20648243.483129]  [<ffffffff90a1d7fe>] ? SyS_mkdir+0xce/0x120
[20648243.489348]  [<ffffffff90803b7d>] ? do_syscall_64+0x8d/0x100
[20648243.495953]  [<ffffffff90e1a88e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[20648243.504010] Code: 41 c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 04 48 63 c9 48 05 80 94 01 00 48 03 04 cd e0 f3 26 91 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 4c 8b 02 4d 85 c0 74 08 41 0f 0d 08 eb 02

[20648245.096252] INFO: rcu_sched self-detected stall on CPU[20648245.100292] INFO: rcu_sched detected stalls on CPUs/tasks:
[20648245.100301] 	5-...: (314314 ticks this GP) idle=765/140000000000001/0 softirq=932844528/932844528 fqs=155780
[20648245.100303]
[20648245.100307] (detected by 15, t=320352 jiffies, g=1017466716, c=1017466715, q=2398236)
[20648245.100308] Task dump for CPU 5:
[20648245.100309] kworker/u98:3   R
[20648245.100310]   running task        0 31950      2 0x00000008
[20648245.100321] Workqueue: writeback wb_workfn
[20648245.100322]  (flush-8:96) 0000000000000000
[20648245.100325]  0000000000000000 0000000000000000 d6a7379563f3a6bd ffff941b31a10708
[20648245.100329]  ffff942b38184840 ffff942b3ed2e000 ffff942b37fced00 0000000000000000
[20648245.100333]  ffff941b31a10710 ffffffff9089460a 000000008c06e0c0Call Trace:
[20648245.100343]  [<ffffffff9089460a>] ? process_one_work+0x18a/0x430
[20648245.100346]  [<ffffffff908948fd>] ? worker_thread+0x4d/0x490
[20648245.100349]  [<ffffffff908948b0>] ? process_one_work+0x430/0x430
[20648245.100353]  [<ffffffff9089a969>] ? kthread+0xd9/0xf0
[20648245.100357]  [<ffffffff90e1a9a4>] ? __switch_to_asm+0x34/0x70
[20648245.100361]  [<ffffffff9089a890>] ? kthread_park+0x60/0x60
[20648245.100364]  [<ffffffff90e1aa37>] ? ret_from_fork+0x57/0x70
[20648245.225904] 	5-...: (314314 ticks this GP) idle=765/140000000000001/0 softirq=932844528/932844528 fqs=155796
[20648245.237258] 	 (t=320386 jiffies g=1017466716 c=1017466715 q=2398236)
[20648245.244648] Task dump for CPU 5:
[20648245.248537] kworker/u98:3   R  running task        0 31950      2 0x00000008
[20648245.256716] Workqueue: writeback wb_workfn (flush-8:96)
[20648245.262853]  ffffffff915198c0 ffffffff908a822b 0000000000000005 ffffffff915198c0
[20648245.271435]  ffffffff90981a3e ffff942b3f0996c0 ffffffff9144fd40 0000000000000000
[20648245.280018]  ffffffff915198c0 00000000ffffffff ffffffff908e3cba 0000000000000001
[20648245.288598] Call Trace:
[20648245.291615]  <IRQ> [20648245.294051]  [<ffffffff908a822b>] ? sched_show_task+0xcb/0x130
[20648245.300857]  [<ffffffff90981a3e>] ? rcu_dump_cpu_stacks+0x92/0xb2
[20648245.307949]  [<ffffffff908e3cba>] ? rcu_check_callbacks+0x75a/0x8b0
[20648245.315233]  [<ffffffff908fa260>] ? tick_sched_do_timer+0x30/0x30
[20648245.322322]  [<ffffffff908ea8a8>] ? update_process_times+0x28/0x50
[20648245.329509]  [<ffffffff908f9c60>] ? tick_sched_handle.isra.12+0x20/0x50
[20648245.337180]  [<ffffffff908fa298>] ? tick_sched_timer+0x38/0x70
[20648245.343978]  [<ffffffff908eb37e>] ? __hrtimer_run_queues+0xde/0x250
[20648245.351260]  [<ffffffff908eba5c>] ? hrtimer_interrupt+0x9c/0x1a0
[20648245.358255]  [<ffffffff90e1de17>] ? smp_apic_timer_interrupt+0x47/0x60
[20648245.365830]  [<ffffffff90e1c6a6>] ? apic_timer_interrupt+0x96/0xa0
[20648245.373015]  <EOI> [20648245.375450]  [<ffffffff908c5b2f>] ? native_queued_spin_lock_slowpath+0x5f/0x1a0
[20648245.383903]  [<ffffffff90e1a79d>] ? _raw_spin_lock+0x1d/0x20
[20648245.390507]  [<ffffffff90a3b94e>] ? writeback_sb_inodes+0x13e/0x4f0
[20648245.397791]  [<ffffffff90a3bd87>] ? __writeback_inodes_wb+0x87/0xb0
[20648245.405076]  [<ffffffff90a3c0fe>] ? wb_writeback+0x27e/0x310
[20648245.411684]  [<ffffffff90a2798c>] ? get_nr_inodes+0x3c/0x60
[20648245.418191]  [<ffffffff90a3ca64>] ? wb_workfn+0x2b4/0x380
[20648245.424506]  [<ffffffff9089460a>] ? process_one_work+0x18a/0x430
[20648245.431498]  [<ffffffff908948fd>] ? worker_thread+0x4d/0x490
[20648245.438103]  [<ffffffff908948b0>] ? process_one_work+0x430/0x430
[20648245.445096]  [<ffffffff9089a969>] ? kthread+0xd9/0xf0
[20648245.451023]  [<ffffffff90e1a9a4>] ? __switch_to_asm+0x34/0x70
[20648245.457724]  [<ffffffff9089a890>] ? kthread_park+0x60/0x60
[20648245.464136]  [<ffffffff90e1aa37>] ? ret_from_fork+0x57/0x70

Host was shutdown for further investigation.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2020-02-01T16:30:05Z] <elukey> powerup analytics1073 (attempt to see if it was only a kernel-related crash) - T244064

elukey claimed this task.

The host has been stable since then, let's re-open if it re-happens.