Page MenuHomePhabricator

Kernel panics on Jessie (3.16.0-4-amd64) during IPsec load test
Closed, ResolvedPublic

Description

While using the following config in /etc/ipsec.conf:
ike=aes128gcm16-null-prfsha384-ecp384bp!
esp=aes128gcm16-null-ecp384bp-esn!

Test from wget on berkelium to nginx on curium:
while true ; do wget -nv -O /dev/null http://10.64.0.170/index.nginx-debian.html ; sleep 1 ; done

Caused kernel panics on either host within a few seconds. In one case the kernel panicked when I simply issued 'service ipsec restart'.

I haven't completed isolating the cause, but removing those two lines of config (hence using default ciphers) has allowed this test to run for 10 minutes without errors.

I didn't manage to capture every crash, here's two of them:

first time on curium:

[  143.362136] general protection fault: 0000 [#1] SMP
[  143.367108] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul dcdbas ghash_clmulni_intel aesni_intel ttm aes_x86_64 lrw gf128mul drm_kms_helper glue_helper ablk_helper cryptd psmouse drm evdev joydev serio_raw pcspkr i2c_algo_bit i2c_core tpm_tis lpc_ich tpm ipmi_si mfd_core ipmi_msghandler acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas mptscsih uhci_hcd ehci_pci mptbase ehci_hcd scsi_mod crct10dif_pclmul usbcore crct10dif_common crc32c_intel usb_common bnx2
[  143.477059] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[  143.485477] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  143.493029] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[  143.500494] RIP: 0010:[<ffffffff810a7198>]  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[  143.508839] RSP: 0018:ffff88082f203a88  EFLAGS: 00010096
[  143.514138] RAX: 0000000000000246 RBX: ffff8800ca574cc0 RCX: 0000000000000000
[  143.521256] RDX: 8300000000000000 RSI: 0000000000000001 RDI: ffff8800ca574cc0
[  143.528374] RBP: ffff8800ca574cc8 R08: 0000000000000000 R09: 0000000000000000
[  143.535492] R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
[  143.542608] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88080ca45470
[  143.549727] FS:  0000000000000000(0000) GS:ffff88082f200000(0000) knlGS:0000000000000000
[  143.557797] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  143.563527] CR2: 000000000198afc8 CR3: 0000000001813000 CR4: 00000000000007f0
[  143.570645] Stack:
[  143.572648]  0000000068a64b3c ffff8800ca574cc0 0000000000000246 0000000000000001
[  143.580063]  0000000000000000 0000000000000000 ffff88080ca45470 ffffffff810a7404
[  143.587481]  ffff88080d1eb180 ffff88080d1eb798 ffff88080ca45478 0000000000000078
[  143.594898] Call Trace:
[  143.597335]  <IRQ>
[  143.599252]  [<ffffffff810a7404>] ? __wake_up+0x34/0x50
[  143.604659]  [<ffffffff8140588c>] ? sock_def_wakeup+0x2c/0x30
[  143.610394]  [<ffffffff81466259>] ? tcp_fin+0x179/0x1f0
[  143.615607]  [<ffffffff81468638>] ? tcp_data_queue+0x758/0xcf0
[  143.621427]  [<ffffffff8146b255>] ? tcp_rcv_established+0x1e5/0x6c0
[  143.627681]  [<ffffffff812b574d>] ? csum_partial+0xd/0x20
[  143.633067]  [<ffffffff8147514f>] ? tcp_v4_do_rcv+0x1af/0x4c0
[  143.638799]  [<ffffffff81476f06>] ? tcp_v4_rcv+0x696/0x7a0
[  143.644273]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[  143.650787]  [<ffffffff814a2128>] ? xfrm4_transport_finish+0x78/0xf0
[  143.657127]  [<ffffffff814ae70f>] ? xfrm_input+0x50f/0x560
[  143.662600]  [<ffffffff814a2bee>] ? xfrm4_esp_rcv+0x2e/0x70
[  143.668159]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[  143.674672]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[  143.681357]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[  143.688043]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[  143.693870]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[  143.700383]  [<ffffffff810bae73>] ? handle_irq_event+0x43/0x60
[  143.706207]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[  143.712460]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[  143.718193]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[  143.723751]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[  143.728876]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  143.733829]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[  143.739646]  <EOI>
[  143.741564]  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[  143.748449]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[  143.754528]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[  143.760607]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[  143.766686]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[  143.772331]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[  143.777804]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[  143.783970]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[  143.790221] Code: 00 00 00 66 66 66 66 90 41 57 41 56 41 89 f6 41 55 41 89 cd 41 54 4d 89 c4 55 48 8d 6f 08 53 48 83 ec 08 89 54 24 04 48 8b 57 08 <48> 8b 0a 48 39 d5 48 8d 42 e8 4c 8d 79 e8 75 0e eb 3e 66 0f 1f
[  143.809590] RIP  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[  143.815602]  RSP <ffff88082f203a88>
[  143.819082] ---[ end trace 860ccb00e1b74b7d ]---
[  143.827245] Kernel panic - not syncing: Fatal exception in interrupt
[  143.833769] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  143.847477] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[  143.854605] ------------[ cut here ]------------
[  143.859213] WARNING: CPU: 0 PID: 0 at /build/linux-SAvLSw/linux-3.16.7-ckt7/arch/x86/kernel/smp.c:124 update_process_times+0x59/0x70()
[  143.871272] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul dcdbas ghash_clmulni_intel aesni_intel ttm aes_x86_64 lrw gf128mul drm_kms_helper glue_helper ablk_helper cryptd psmouse drm evdev joydev serio_raw pcspkr i2c_algo_bit i2c_core tpm_tis lpc_ich tpm ipmi_si mfd_core ipmi_msghandler acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas mptscsih uhci_hcd ehci_pci mptbase ehci_hcd scsi_mod crct10dif_pclmul usbcore crct10dif_common crc32c_intel usb_common bnx2
[  143.981211] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[  143.990669] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  143.998220]  0000000000000009 ffffffff81509e7c 0000000000000000 ffffffff81067727
[  144.005637]  ffffffff8181a460 0000000000000000 0000000000000000 ffff88082f20d1a0
[  144.013052]  ffff88082f203860 ffffffff81074a19 ffff88082f203898 ffff88082f20db40
[  144.020469] Call Trace:
[  144.022906]  <IRQ>  [<ffffffff81509e7c>] ? dump_stack+0x41/0x51
[  144.028834]  [<ffffffff81067727>] ? warn_slowpath_common+0x77/0x90
[  144.035000]  [<ffffffff81074a19>] ? update_process_times+0x59/0x70
[  144.041168]  [<ffffffff810cfb50>] ? tick_sched_handle.isra.16+0x20/0x60
[  144.047768]  [<ffffffff810cfbcc>] ? tick_sched_timer+0x3c/0x60
[  144.053587]  [<ffffffff8108aee7>] ? __run_hrtimer+0x67/0x1c0
[  144.059233]  [<ffffffff8108b299>] ? hrtimer_interrupt+0xe9/0x220
[  144.065224]  [<ffffffff81512e0b>] ? smp_apic_timer_interrupt+0x3b/0x60
[  144.071737]  [<ffffffff81510efd>] ? apic_timer_interrupt+0x6d/0x80
[  144.077903]  [<ffffffff8150704b>] ? panic+0x1b8/0x1fc
[  144.082944]  [<ffffffff810163f1>] ? oops_end+0xd1/0xe0
[  144.088070]  [<ffffffff81511f48>] ? general_protection+0x28/0x30
[  144.094063]  [<ffffffff810a7198>] ? __wake_up_common+0x28/0x90
[  144.099882]  [<ffffffff810a7404>] ? __wake_up+0x34/0x50
[  144.105096]  [<ffffffff8140588c>] ? sock_def_wakeup+0x2c/0x30
[  144.110828]  [<ffffffff81466259>] ? tcp_fin+0x179/0x1f0
[  144.116041]  [<ffffffff81468638>] ? tcp_data_queue+0x758/0xcf0
[  144.121859]  [<ffffffff8146b255>] ? tcp_rcv_established+0x1e5/0x6c0
[  144.128113]  [<ffffffff812b574d>] ? csum_partial+0xd/0x20
[  144.133498]  [<ffffffff8147514f>] ? tcp_v4_do_rcv+0x1af/0x4c0
[  144.139229]  [<ffffffff81476f06>] ? tcp_v4_rcv+0x696/0x7a0
[  144.144703]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[  144.151215]  [<ffffffff814a2128>] ? xfrm4_transport_finish+0x78/0xf0
[  144.157553]  [<ffffffff814ae70f>] ? xfrm_input+0x50f/0x560
[  144.163027]  [<ffffffff814a2bee>] ? xfrm4_esp_rcv+0x2e/0x70
[  144.168585]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[  144.175098]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[  144.181783]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[  144.188467]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[  144.194292]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[  144.200806]  [<ffffffff810bae73>] ? handle_irq_event+0x43/0x60
[  144.206629]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[  144.212881]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[  144.218613]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[  144.224172]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[  144.229297]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  144.234248]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[  144.240066]  <EOI>  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[  144.247376]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[  144.253456]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[  144.259535]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[  144.265615]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[  144.271260]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[  144.276733]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[  144.282898]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[  144.289150] ---[ end trace 860ccb00e1b74b7e ]---

second time on curium:

[  792.954927] general protection fault: 0000 [#1] SMP
[  792.959901] Modules linked in: esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport binfmt_misc 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 deflate ctr twofish_generic nfsd auth_rpcgss oid_registry nfs_acl nfs twofish_x86_64_3way lockd twofish_x86_64 twofish_common fscache camellia_generic sunrpc camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul ghash_clmulni_intel dcdbas aesni_intel aes_x86_64 lrw ttm gf128mul glue_helper ablk_helper drm_kms_helper cryptd psmouse drm evdev serio_raw joydev pcspkr i2c_algo_bit i2c_core lpc_ich mfd_core ipmi_si tpm_tis ipmi_msghandler tpm acpi_power_meter button i7core_edac shpchp edac_core processor thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas mptscsih uhci_hcd ehci_pci mptbase ehci_hcd scsi_mod crct10dif_pclmul usbcore crct10dif_common crc32c_intel usb_common bnx2
[  793.069841] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[  793.078261] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  793.085814] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[  793.093277] RIP: 0010:[<ffffffff810a7198>]  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[  793.101623] RSP: 0018:ffff88082f203d18  EFLAGS: 00010096
[  793.106921] RAX: 0000000000000282 RBX: ffff88080d732a40 RCX: 0000000000000001
[  793.114039] RDX: 8800000000000000 RSI: 0000000000000001 RDI: ffff88080d732a40
[  793.121156] RBP: ffff88080d732a48 R08: 0000000000000304 R09: ffffffff812cdf30
[  793.128274] R10: ffffffff81adf368 R11: 0000000000000000 R12: 0000000000000304
[  793.135391] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88080b77f500
[  793.142510] FS:  0000000000000000(0000) GS:ffff88082f200000(0000) knlGS:0000000000000000
[  793.150580] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  793.156310] CR2: 00007fb54d33e001 CR3: 0000000001813000 CR4: 00000000000007f0
[  793.163429] Stack:
[  793.165431]  00000001818e7860 ffff88080d732a40 0000000000000001 0000000000000001
[  793.172848]  0000000000000304 0000000000000282 ffff88080b77f500 ffffffff810a76fd
[  793.180263]  ffff88080a574b00 ffff88080a574c14 0000000000000000 0000000000005962
[  793.187679] Call Trace:
[  793.190117]  <IRQ>
[  793.192034]  [<ffffffff810a76fd>] ? __wake_up_sync_key+0x3d/0x60
[  793.198224]  [<ffffffff814058d6>] ? sock_def_write_space+0x46/0x90
[  793.204390]  [<ffffffff814074f3>] ? sock_wfree+0x53/0x60
[  793.209690]  [<ffffffff8140b297>] ? skb_release_head_state+0x57/0xf0
[  793.216029]  [<ffffffff8140bbee>] ? skb_release_all+0xe/0x30
[  793.221674]  [<ffffffff8140be27>] ? consume_skb+0x27/0x80
[  793.227112]  [<ffffffffa000ce90>] ? bnx2_poll_work+0x250/0x1230 [bnx2]
[  793.233627]  [<ffffffff810bae73>] ? handle_irq_event+0x43/0x60
[  793.239448]  [<ffffffff810bd8f5>] ? handle_edge_irq+0x85/0x150
[  793.245266]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  793.250224]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[  793.256476]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[  793.262209]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[  793.267767]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[  793.272891]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  793.277845]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[  793.283662]  <EOI>
[  793.285578]  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[  793.292462]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[  793.298542]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[  793.304622]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[  793.310702]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[  793.316346]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[  793.321819]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[  793.327984]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[  793.334234] Code: 00 00 00 66 66 66 66 90 41 57 41 56 41 89 f6 41 55 41 89 cd 41 54 4d 89 c4 55 48 8d 6f 08 53 48 83 ec 08 89 54 24 04 48 8b 57 08 <48> 8b 0a 48 39 d5 48 8d 42 e8 4c 8d 79 e8 75 0e eb 3e 66 0f 1f
[  793.353599] RIP  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[  793.359600]  RSP <ffff88082f203d18>
[  793.363410] ---[ end trace 33e13b62ac4bc73b ]---
[  793.371516] Kernel panic - not syncing: Fatal exception in interrupt
[  793.378015] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  793.391665] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[  793.398795] ------------[ cut here ]------------
[  793.403403] WARNING: CPU: 0 PID: 0 at /build/linux-SAvLSw/linux-3.16.7-ckt7/arch/x86/kernel/smp.c:124 update_process_times+0x59/0x70()
[  793.415462] Modules linked in: esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport binfmt_misc 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 deflate ctr twofish_generic nfsd auth_rpcgss oid_registry nfs_acl nfs twofish_x86_64_3way lockd twofish_x86_64 twofish_common fscache camellia_generic sunrpc camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul ghash_clmulni_intel dcdbas aesni_intel aes_x86_64 lrw ttm gf128mul glue_helper ablk_helper drm_kms_helper cryptd psmouse drm evdev serio_raw joydev pcspkr i2c_algo_bit i2c_core lpc_ich mfd_core ipmi_si tpm_tis ipmi_msghandler tpm acpi_power_meter button i7core_edac shpchp edac_core processor thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas mptscsih uhci_hcd ehci_pci mptbase ehci_hcd scsi_mod crct10dif_pclmul usbcore crct10dif_common crc32c_intel usb_common bnx2
[  793.525392] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[  793.534850] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  793.542402]  0000000000000009 ffffffff81509e7c 0000000000000000 ffffffff81067727
[  793.549820]  ffffffff8181a460 0000000000000000 0000000000000000 ffff88082f20d1a0
[  793.557236]  ffff88082f203af0 ffffffff81074a19 ffff88082f203b28 ffff88082f20db40
[  793.564652] Call Trace:
[  793.567088]  <IRQ>  [<ffffffff81509e7c>] ? dump_stack+0x41/0x51
[  793.573017]  [<ffffffff81067727>] ? warn_slowpath_common+0x77/0x90
[  793.579184]  [<ffffffff81074a19>] ? update_process_times+0x59/0x70
[  793.585352]  [<ffffffff810cfb50>] ? tick_sched_handle.isra.16+0x20/0x60
[  793.591950]  [<ffffffff810cfbcc>] ? tick_sched_timer+0x3c/0x60
[  793.597768]  [<ffffffff8108aee7>] ? __run_hrtimer+0x67/0x1c0
[  793.603413]  [<ffffffff8108b299>] ? hrtimer_interrupt+0xe9/0x220
[  793.609406]  [<ffffffff81512e0b>] ? smp_apic_timer_interrupt+0x3b/0x60
[  793.615918]  [<ffffffff81510efd>] ? apic_timer_interrupt+0x6d/0x80
[  793.622083]  [<ffffffff8150704b>] ? panic+0x1b8/0x1fc
[  793.627125]  [<ffffffff810163f1>] ? oops_end+0xd1/0xe0
[  793.632251]  [<ffffffff81511f48>] ? general_protection+0x28/0x30
[  793.638244]  [<ffffffff812cdf30>] ? unmap_single+0x30/0x30
[  793.643715]  [<ffffffff810a7198>] ? __wake_up_common+0x28/0x90
[  793.649536]  [<ffffffff810a76fd>] ? __wake_up_sync_key+0x3d/0x60
[  793.655528]  [<ffffffff814058d6>] ? sock_def_write_space+0x46/0x90
[  793.661693]  [<ffffffff814074f3>] ? sock_wfree+0x53/0x60
[  793.666992]  [<ffffffff8140b297>] ? skb_release_head_state+0x57/0xf0
[  793.673332]  [<ffffffff8140bbee>] ? skb_release_all+0xe/0x30
[  793.678977]  [<ffffffff8140be27>] ? consume_skb+0x27/0x80
[  793.684368]  [<ffffffffa000ce90>] ? bnx2_poll_work+0x250/0x1230 [bnx2]
[  793.690880]  [<ffffffff810bae73>] ? handle_irq_event+0x43/0x60
[  793.696699]  [<ffffffff810bd8f5>] ? handle_edge_irq+0x85/0x150
[  793.702519]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  793.707477]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[  793.713728]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[  793.719460]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[  793.725018]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[  793.730141]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[  793.735091]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[  793.740908]  <EOI>  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[  793.748216]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[  793.754293]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[  793.760369]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[  793.766446]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[  793.772091]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[  793.777560]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[  793.783724]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[  793.789973] ---[ end trace 33e13b62ac4bc73c ]---

Event Timeline

Gage created this task.Apr 2 2015, 10:18 AM
Gage raised the priority of this task from to Needs Triage.
Gage updated the task description. (Show Details)
Gage added subscribers: Gage, faidon, BBlack.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 2 2015, 10:18 AM
faidon updated the task description. (Show Details)Apr 2 2015, 10:21 AM
faidon set Security to None.

Got another one: I changed ciphers on berkelium & restarted the daemon there; before I had a chance to restart the daemon on curium for corresponding change, curium panicked. admittedly this is not a circumstance we'll often encounter in prod:

[43728.489797] ------------[ cut here ]------------
[43728.494405] kernel BUG at /build/linux-SAvLSw/linux-3.16.7-ckt7/net/xfrm/xfrm_policy.c:307!
[43728.502736] invalid opcode: 0000 [#1] SMP
[43728.506839] Modules linked in: dm_mod authenc binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c intel_powerclamp ipmi_devintf coretemp kvm crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support dcdbas aesni_intel ttm aes_x86_64 lrw gf128mul drm_kms_helper glue_helper ablk_helper cryptd drm evdev i2c_algo_bit psmouse joydev i2c_core pcspkr serio_raw ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich mfd_core acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid uhci_hcd ehci_pci mptsas scsi_transport_sas mptscsih ehci_hcd crct10dif_pclmul crct10dif_common mptbase crc32c_intel usbcore scsi_mod usb_common bnx2
[43728.618088] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[43728.626506] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[43728.634057] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[43728.641521] RIP: 0010:[<ffffffff814a3676>]  [<ffffffff814a3676>] xfrm_policy_destroy+0x46/0x50
[43728.650127] RSP: 0018:ffff88082f203b98  EFLAGS: 00010246
[43728.655424] RAX: 0000000000000000 RBX: ffff88080b308c00 RCX: 0000000000000000
[43728.662541] RDX: ffff88082f203bd0 RSI: 00000000fffffe01 RDI: ffff88080b308c00
[43728.669658] RBP: 0000000000000002 R08: 0000000000000001 R09: ffff88080b308c00
[43728.676776] R10: 00000000aa00400a R11: 00000000ffffffff R12: ffff88080b129000
[43728.683894] R13: ffffffff81aead20 R14: 0000000000000000 R15: ffff88080d0f5880
[43728.691012] FS:  0000000000000000(0000) GS:ffff88082f200000(0000) knlGS:0000000000000000
[43728.699083] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[43728.704813] CR2: 00007fdda324f148 CR3: 0000000001813000 CR4: 00000000000007f0
[43728.711930] Stack:
[43728.713934]  0000000000000000 ffffffff814a8add ffffffff818b9080 000288082f203c18
[43728.721350]  ffff88080b308c00 ffff88080b308c00 ffff88080ad58000 0000000000000000
[43728.728853]  ffffffff8149b645 0000000000000000 ffff88082f203c18 ffff88080e431f40
[43728.736265] Call Trace:
[43728.738704]  <IRQ>
[43728.740621]  [<ffffffff814a8add>] ? __xfrm_policy_check+0x5dd/0x640
[43728.747069]  [<ffffffff8149b645>] ? __fib_lookup+0x45/0x80
[43728.752544]  [<ffffffff81490aa1>] ? fib_validate_source+0x321/0x450
[43728.758799]  [<ffffffff81480853>] ? udp_queue_rcv_skb+0x53/0x400
[43728.764792]  [<ffffffff8148149a>] ? __udp4_lib_rcv+0x43a/0x730
[43728.770614]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[43728.777126]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[43728.783812]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[43728.790497]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[43728.796325]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[43728.802842]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[43728.809095]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[43728.814830]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[43728.820389]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[43728.825515]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[43728.830469]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[43728.836287]  <EOI>
[43728.838203]  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[43728.845090]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[43728.851169]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[43728.857249]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[43728.863329]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[43728.868974]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[43728.874445]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[43728.880610]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[43728.886861] Code: 85 c0 75 25 48 8d bb 70 01 00 00 e8 d5 03 bd ff 85 c0 75 15 48 8b bb d0 01 00 00 e8 35 f3 d8 ff 48 89 df 5b e9 bc a5 ce ff 0f 0b <0f> 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 89 f0 81 ce 00
[43728.906312] RIP  [<ffffffff814a3676>] xfrm_policy_destroy+0x46/0x50
[43728.912574]  RSP <ffff88082f203b98>
[43728.916398] ---[ end trace 69b8fc936ab2b0b7 ]---
[43728.924591] Kernel panic - not syncing: Fatal exception in interrupt
[43728.931087] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[43728.944828] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[43728.951951] ------------[ cut here ]------------
[43728.956559] WARNING: CPU: 0 PID: 0 at /build/linux-SAvLSw/linux-3.16.7-ckt7/arch/x86/kernel/smp.c:124 update_process_times+0x59/0x70()
[43728.968616] Modules linked in: dm_mod authenc binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c intel_powerclamp ipmi_devintf coretemp kvm crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support dcdbas aesni_intel ttm aes_x86_64 lrw gf128mul drm_kms_helper glue_helper ablk_helper cryptd drm evdev i2c_algo_bit psmouse joydev i2c_core pcspkr serio_raw ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich mfd_core acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid uhci_hcd ehci_pci mptsas scsi_transport_sas mptscsih ehci_hcd crct10dif_pclmul crct10dif_common mptbase crc32c_intel usbcore scsi_mod usb_common bnx2
[43729.079845] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[43729.089303] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[43729.096854]  0000000000000009 ffffffff81509e7c 0000000000000000 ffffffff81067727
[43729.104266]  ffffffff8181a460 0000000000000000 0000000000000000 ffff88082f20d1a0
[43729.111678]  ffff88082f2038c0 ffffffff81074a19 ffff88082f2038f8 ffff88082f20db40
[43729.119093] Call Trace:
[43729.121531]  <IRQ>  [<ffffffff81509e7c>] ? dump_stack+0x41/0x51
[43729.127458]  [<ffffffff81067727>] ? warn_slowpath_common+0x77/0x90
[43729.133623]  [<ffffffff81074a19>] ? update_process_times+0x59/0x70
[43729.139791]  [<ffffffff810cfb50>] ? tick_sched_handle.isra.16+0x20/0x60
[43729.146390]  [<ffffffff810cfbcc>] ? tick_sched_timer+0x3c/0x60
[43729.152210]  [<ffffffff8108aee7>] ? __run_hrtimer+0x67/0x1c0
[43729.157855]  [<ffffffff8108b299>] ? hrtimer_interrupt+0xe9/0x220
[43729.163848]  [<ffffffff81512e0b>] ? smp_apic_timer_interrupt+0x3b/0x60
[43729.170360]  [<ffffffff81510efd>] ? apic_timer_interrupt+0x6d/0x80
[43729.176525]  [<ffffffff8150704b>] ? panic+0x1b8/0x1fc
[43729.181566]  [<ffffffff810163f1>] ? oops_end+0xd1/0xe0
[43729.186691]  [<ffffffff81013810>] ? do_error_trap+0x70/0xe0
[43729.192252]  [<ffffffff814a3676>] ? xfrm_policy_destroy+0x46/0x50
[43729.198332]  [<ffffffff814a72cc>] ? xfrm_policy_lookup_bytype+0x10c/0x240
[43729.205105]  [<ffffffff814a7460>] ? __xfrm_policy_lookup+0x60/0x60
[43729.211271]  [<ffffffff814a7487>] ? xfrm_policy_lookup+0x27/0x80
[43729.217263]  [<ffffffff815119be>] ? invalid_op+0x1e/0x30
[43729.222563]  [<ffffffff814a3676>] ? xfrm_policy_destroy+0x46/0x50
[43729.228645]  [<ffffffff814a8add>] ? __xfrm_policy_check+0x5dd/0x640
[43729.234896]  [<ffffffff8149b645>] ? __fib_lookup+0x45/0x80
[43729.240367]  [<ffffffff81490aa1>] ? fib_validate_source+0x321/0x450
[43729.246619]  [<ffffffff81480853>] ? udp_queue_rcv_skb+0x53/0x400
[43729.252611]  [<ffffffff8148149a>] ? __udp4_lib_rcv+0x43a/0x730
[43729.258429]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[43729.264939]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[43729.271623]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[43729.278306]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[43729.284128]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[43729.290641]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[43729.296890]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[43729.302622]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[43729.308179]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[43729.313303]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[43729.318253]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[43729.324071]  <EOI>  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[43729.331379]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[43729.337456]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[43729.343534]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[43729.349612]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[43729.355256]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[43729.360726]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[43729.366891]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[43729.373142] ---[ end trace 69b8fc936ab2b0b8 ]---

Ok, this is reproducible and seems to be the primary problem I was having yesterday: enabling Extended Sequence Numbers (ESN, http://kernelnewbies.org/Linux_2_6_39#head-87ffd4407af29460251c521e0228fe0ac9219d4b) causes a crash within 5 wgets.

Here's two traces: the first time the HTTP serving host (curium) crashed, the second time it was the wget client host (berkelium):

I tested every other change between the default and the original crashing config for 10 minutes before narrowing it down to ESN, which is a config parameter applied to ESP (child) security associations:

esp=aes128gcm16-null-ecp384bp-esn!

[15747.762402] general protection fault: 0000 [#1] SMP
[15747.767375] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper ttm cryptd drm_kms_helper psmouse drm serio_raw pcspkr joydev evdev i2c_algo_bit i2c_core tpm_tis tpm ipmi_si lpc_ich ipmi_msghandler mfd_core i7core_edac acpi_power_meter button edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas uhci_hcd ehci_pci mptscsih ehci_hcd mptbase crct10dif_pclmul crct10dif_common usbcore scsi_mod crc32c_intel usb_common bnx2
[15747.877313] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[15747.885732] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[15747.893283] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[15747.900746] RIP: 0010:[<ffffffff810a7198>]  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[15747.909092] RSP: 0018:ffff88082f203a88  EFLAGS: 00010096
[15747.914390] RAX: 0000000000000246 RBX: ffff88080b8bcf00 RCX: 0000000000000000
[15747.921508] RDX: 0b00000000000000 RSI: 0000000000000001 RDI: ffff88080b8bcf00
[15747.928627] RBP: ffff88080b8bcf08 R08: 0000000000000000 R09: 0000000000000000
[15747.935744] R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
[15747.942863] R13: 0000000000000000 R14: 0000000000000001 R15: ffff880809703670
[15747.949982] FS:  0000000000000000(0000) GS:ffff88082f200000(0000) knlGS:0000000000000000
[15747.958055] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[15747.963787] CR2: 00007fef60699000 CR3: 0000000001813000 CR4: 00000000000007f0
[15747.970904] Stack:
[15747.972907]  000000008ff31d19 ffff88080b8bcf00 0000000000000246 0000000000000001
[15747.980321]  0000000000000000 0000000000000000 ffff880809703670 ffffffff810a7404
[15747.987735]  ffff88080dc62080 ffff88080dc62698 ffff880809703678 0000000000000078
[15747.995152] Call Trace:
[15747.997590]  <IRQ>
[15747.999508]  [<ffffffff810a7404>] ? __wake_up+0x34/0x50
[15748.004918]  [<ffffffff8140588c>] ? sock_def_wakeup+0x2c/0x30
[15748.010651]  [<ffffffff81466259>] ? tcp_fin+0x179/0x1f0
[15748.015865]  [<ffffffff81468638>] ? tcp_data_queue+0x758/0xcf0
[15748.021683]  [<ffffffff8146b255>] ? tcp_rcv_established+0x1e5/0x6c0
[15748.027936]  [<ffffffff812b574d>] ? csum_partial+0xd/0x20
[15748.033323]  [<ffffffff8147514f>] ? tcp_v4_do_rcv+0x1af/0x4c0
[15748.039056]  [<ffffffff81476f06>] ? tcp_v4_rcv+0x696/0x7a0
[15748.044529]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[15748.051042]  [<ffffffff814a2128>] ? xfrm4_transport_finish+0x78/0xf0
[15748.057382]  [<ffffffff814ae70f>] ? xfrm_input+0x50f/0x560
[15748.062855]  [<ffffffff814a2bee>] ? xfrm4_esp_rcv+0x2e/0x70
[15748.068414]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[15748.074927]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[15748.081611]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[15748.088297]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[15748.094122]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[15748.100638]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[15748.106891]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[15748.112625]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[15748.118184]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[15748.123311]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[15748.128265]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[15748.134083]  <EOI>
[15748.136001]  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[15748.142886]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[15748.148964]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[15748.155042]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[15748.161121]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[15748.166767]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[15748.172240]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[15748.178405]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[15748.184656] Code: 00 00 00 66 66 66 66 90 41 57 41 56 41 89 f6 41 55 41 89 cd 41 54 4d 89 c4 55 48 8d 6f 08 53 48 83 ec 08 89 54 24 04 48 8b 57 08 <48> 8b 0a 48 39 d5 48 8d 42 e8 4c 8d 79 e8 75 0e eb 3e 66 0f 1f
[15748.204019] RIP  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[15748.210020]  RSP <ffff88082f203a88>
[15748.213831] ---[ end trace d12e1b76d2fbaf47 ]---
[15748.221954] Kernel panic - not syncing: Fatal exception in interrupt
[15748.228487] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[15748.242167] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[15748.249296] ------------[ cut here ]------------
[15748.253905] WARNING: CPU: 0 PID: 0 at /build/linux-SAvLSw/linux-3.16.7-ckt7/arch/x86/kernel/smp.c:124 update_process_times+0x59/0x70()
[15748.265963] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper ttm cryptd drm_kms_helper psmouse drm serio_raw pcspkr joydev evdev i2c_algo_bit i2c_core tpm_tis tpm ipmi_si lpc_ich ipmi_msghandler mfd_core i7core_edac acpi_power_meter button edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid mptsas scsi_transport_sas uhci_hcd ehci_pci mptscsih ehci_hcd mptbase crct10dif_pclmul crct10dif_common usbcore scsi_mod crc32c_intel usb_common bnx2
[15748.375899] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[15748.385357] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[15748.392909]  0000000000000009 ffffffff81509e7c 0000000000000000 ffffffff81067727
[15748.400325]  ffffffff8181a460 0000000000000000 0000000000000000 ffff88082f20d1a0
[15748.407741]  ffff88082f203860 ffffffff81074a19 ffff88082f203898 ffff88082f20db40
[15748.415154] Call Trace:
[15748.417591]  <IRQ>  [<ffffffff81509e7c>] ? dump_stack+0x41/0x51
[15748.423519]  [<ffffffff81067727>] ? warn_slowpath_common+0x77/0x90
[15748.429685]  [<ffffffff81074a19>] ? update_process_times+0x59/0x70
[15748.435852]  [<ffffffff810cfb50>] ? tick_sched_handle.isra.16+0x20/0x60
[15748.442451]  [<ffffffff810cfbcc>] ? tick_sched_timer+0x3c/0x60
[15748.448270]  [<ffffffff8108aee7>] ? __run_hrtimer+0x67/0x1c0
[15748.453916]  [<ffffffff8108b299>] ? hrtimer_interrupt+0xe9/0x220
[15748.459908]  [<ffffffff81512e0b>] ? smp_apic_timer_interrupt+0x3b/0x60
[15748.466420]  [<ffffffff81510efd>] ? apic_timer_interrupt+0x6d/0x80
[15748.472585]  [<ffffffff8150704b>] ? panic+0x1b8/0x1fc
[15748.477628]  [<ffffffff810163f1>] ? oops_end+0xd1/0xe0
[15748.482755]  [<ffffffff81511f48>] ? general_protection+0x28/0x30
[15748.488749]  [<ffffffff810a7198>] ? __wake_up_common+0x28/0x90
[15748.494567]  [<ffffffff810a7404>] ? __wake_up+0x34/0x50
[15748.499780]  [<ffffffff8140588c>] ? sock_def_wakeup+0x2c/0x30
[15748.505514]  [<ffffffff81466259>] ? tcp_fin+0x179/0x1f0
[15748.510727]  [<ffffffff81468638>] ? tcp_data_queue+0x758/0xcf0
[15748.516546]  [<ffffffff8146b255>] ? tcp_rcv_established+0x1e5/0x6c0
[15748.522798]  [<ffffffff812b574d>] ? csum_partial+0xd/0x20
[15748.528184]  [<ffffffff8147514f>] ? tcp_v4_do_rcv+0x1af/0x4c0
[15748.533917]  [<ffffffff81476f06>] ? tcp_v4_rcv+0x696/0x7a0
[15748.539390]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[15748.545901]  [<ffffffff814a2128>] ? xfrm4_transport_finish+0x78/0xf0
[15748.552241]  [<ffffffff814ae70f>] ? xfrm_input+0x50f/0x560
[15748.557713]  [<ffffffff814a2bee>] ? xfrm4_esp_rcv+0x2e/0x70
[15748.563272]  [<ffffffff814528be>] ? ip_local_deliver_finish+0x9e/0x200
[15748.569784]  [<ffffffff8141b933>] ? __netif_receive_skb_core+0x533/0x750
[15748.576470]  [<ffffffff8141bbcf>] ? netif_receive_skb_internal+0x1f/0x90
[15748.583156]  [<ffffffff8141c6b0>] ? napi_gro_receive+0xb0/0xe0
[15748.588980]  [<ffffffffa000d424>] ? bnx2_poll_work+0x7e4/0x1230 [bnx2]
[15748.595497]  [<ffffffffa000de9d>] ? bnx2_poll_msix+0x2d/0xb0 [bnx2]
[15748.601748]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[15748.607480]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[15748.613038]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[15748.618161]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[15748.623115]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[15748.628931]  <EOI>  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[15748.636245]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[15748.642325]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[15748.648402]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[15748.654482]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[15748.660128]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[15748.665600]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[15748.671765]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[15748.678015] ---[ end trace d12e1b76d2fbaf48 ]---

second time:

[60774.924284] general protection fault: 0000 [#1] SMP
[60774.929255] Modules linked in: seqiv binfmt_misc esp6 xfrm6_mode_transport authenc xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache deflate sunrpc ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul ghash_clmulni_intel dcdbas ttm aesni_intel drm_kms_helper aes_x86_64 lrw gf128mul drm glue_helper ablk_helper cryptd i2c_algo_bit evdev i2c_core joydev psmouse pcspkr serio_raw lpc_ich mfd_core ipmi_si tpm_tis ipmi_msghandler tpm acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid uhci_hcd ehci_pci ehci_hcd mptsas scsi_transport_sas mptscsih crct10dif_pclmul mptbase crct10dif_common crc32c_intel usbcore scsi_mod usb_common bnx2
[60775.039910] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[60775.048330] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[60775.055882] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[60775.063347] RIP: 0010:[<ffffffff810a7198>]  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[60775.071692] RSP: 0018:ffff88082f203d10  EFLAGS: 00010082
[60775.076990] RAX: 0000000000000286 RBX: ffff88080b935700 RCX: 0000000000000001
[60775.084109] RDX: 0c00000000000000 RSI: 0000000000000001 RDI: ffff88080b935700
[60775.091227] RBP: ffff88080b935708 R08: 0000000000000304 R09: ffffffff812cdf30
[60775.098345] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000304
[60775.105463] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88080a5b9e00
[60775.112582] FS:  0000000000000000(0000) GS:ffff88082f200000(0000) knlGS:0000000000000000
[60775.120653] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[60775.126384] CR2: 00007fc662f92d98 CR3: 0000000001813000 CR4: 00000000000007f0
[60775.133502] Stack:
[60775.135505]  0000000100000000 ffff88080b935700 0000000000000001 0000000000000001
[60775.142919]  0000000000000304 0000000000000286 ffff88080a5b9e00 ffffffff810a76fd
[60775.150335]  ffff88080de3ab00 ffff88080de3ac14 0000000000000000 0000000000001160
[60775.157750] Call Trace:
[60775.160189]  <IRQ>
[60775.162106]  [<ffffffff810a76fd>] ? __wake_up_sync_key+0x3d/0x60
[60775.168292]  [<ffffffff814058d6>] ? sock_def_write_space+0x46/0x90
[60775.174460]  [<ffffffff814074f3>] ? sock_wfree+0x53/0x60
[60775.179759]  [<ffffffff8140b297>] ? skb_release_head_state+0x57/0xf0
[60775.186097]  [<ffffffff8140bbee>] ? skb_release_all+0xe/0x30
[60775.191741]  [<ffffffff8140be27>] ? consume_skb+0x27/0x80
[60775.197135]  [<ffffffffa000ce90>] ? bnx2_poll_work+0x250/0x1230 [bnx2]
[60775.203653]  [<ffffffffa000df6b>] ? bnx2_poll+0x4b/0x277 [bnx2]
[60775.209558]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[60775.215291]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[60775.220850]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[60775.225976]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[60775.230929]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[60775.236747]  <EOI>
[60775.238663]  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[60775.245546]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[60775.251625]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[60775.257705]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[60775.263784]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[60775.269429]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[60775.274902]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[60775.281067]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[60775.287317] Code: 00 00 00 66 66 66 66 90 41 57 41 56 41 89 f6 41 55 41 89 cd 41 54 4d 89 c4 55 48 8d 6f 08 53 48 83 ec 08 89 54 24 04 48 8b 57 08 <48> 8b 0a 48 39 d5 48 8d 42 e8 4c 8d 79 e8 75 0e eb 3e 66 0f 1f
[60775.306685] RIP  [<ffffffff810a7198>] __wake_up_common+0x28/0x90
[60775.312686]  RSP <ffff88082f203d10>
[60775.316498] ---[ end trace c397b1dd1ab3d5cd ]---
[60775.324649] Kernel panic - not syncing: Fatal exception in interrupt
[60775.331149] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[60775.344841] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[60775.351968] ------------[ cut here ]------------
[60775.356575] WARNING: CPU: 0 PID: 0 at /build/linux-SAvLSw/linux-3.16.7-ckt7/arch/x86/kernel/smp.c:124 update_process_times+0x59/0x70()
[60775.368632] Modules linked in: seqiv binfmt_misc esp6 xfrm6_mode_transport authenc xfrm4_mode_transport 8021q garp stp mrp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache deflate sunrpc ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic hmac crypto_null af_key xfrm_algo xfs libcrc32c ipmi_devintf intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm crc32_pclmul ghash_clmulni_intel dcdbas ttm aesni_intel drm_kms_helper aes_x86_64 lrw gf128mul drm glue_helper ablk_helper cryptd i2c_algo_bit evdev i2c_core joydev psmouse pcspkr serio_raw lpc_ich mfd_core ipmi_si tpm_tis ipmi_msghandler tpm acpi_power_meter button i7core_edac edac_core processor shpchp thermal_sys autofs4 ext4 crc16 mbcache jbd2 raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid uhci_hcd ehci_pci ehci_hcd mptsas scsi_transport_sas mptscsih crct10dif_pclmul mptbase crct10dif_common crc32c_intel usbcore scsi_mod usb_common bnx2
[60775.479265] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt7-1
[60775.488721] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[60775.496270]  0000000000000009 ffffffff81509e7c 0000000000000000 ffffffff81067727
[60775.503678]  ffffffff8181a460 0000000000000000 0000000000000000 ffff88082f20d1a0
[60775.511085]  ffff88082f203af0 ffffffff81074a19 ffff88082f203b28 ffff88082f20db40
[60775.518495] Call Trace:
[60775.520931]  <IRQ>  [<ffffffff81509e7c>] ? dump_stack+0x41/0x51
[60775.526855]  [<ffffffff81067727>] ? warn_slowpath_common+0x77/0x90
[60775.533018]  [<ffffffff81074a19>] ? update_process_times+0x59/0x70
[60775.539185]  [<ffffffff810cfb50>] ? tick_sched_handle.isra.16+0x20/0x60
[60775.545782]  [<ffffffff810cfbcc>] ? tick_sched_timer+0x3c/0x60
[60775.551600]  [<ffffffff8108aee7>] ? __run_hrtimer+0x67/0x1c0
[60775.557243]  [<ffffffff8108b299>] ? hrtimer_interrupt+0xe9/0x220
[60775.563235]  [<ffffffff81512e0b>] ? smp_apic_timer_interrupt+0x3b/0x60
[60775.569746]  [<ffffffff81510efd>] ? apic_timer_interrupt+0x6d/0x80
[60775.575911]  [<ffffffff8150704b>] ? panic+0x1b8/0x1fc
[60775.580950]  [<ffffffff810163f1>] ? oops_end+0xd1/0xe0
[60775.586074]  [<ffffffff81511f48>] ? general_protection+0x28/0x30
[60775.592066]  [<ffffffff812cdf30>] ? unmap_single+0x30/0x30
[60775.597538]  [<ffffffff810a7198>] ? __wake_up_common+0x28/0x90
[60775.603354]  [<ffffffff810a76fd>] ? __wake_up_sync_key+0x3d/0x60
[60775.609347]  [<ffffffff814058d6>] ? sock_def_write_space+0x46/0x90
[60775.615512]  [<ffffffff814074f3>] ? sock_wfree+0x53/0x60
[60775.620809]  [<ffffffff8140b297>] ? skb_release_head_state+0x57/0xf0
[60775.627147]  [<ffffffff8140bbee>] ? skb_release_all+0xe/0x30
[60775.632792]  [<ffffffff8140be27>] ? consume_skb+0x27/0x80
[60775.638180]  [<ffffffffa000ce90>] ? bnx2_poll_work+0x250/0x1230 [bnx2]
[60775.644694]  [<ffffffffa000df6b>] ? bnx2_poll+0x4b/0x277 [bnx2]
[60775.650597]  [<ffffffff8141bf60>] ? net_rx_action+0x140/0x240
[60775.656328]  [<ffffffff8106c5e1>] ? __do_softirq+0xf1/0x290
[60775.661886]  [<ffffffff8106c9b5>] ? irq_exit+0x95/0xa0
[60775.667009]  [<ffffffff81512d42>] ? do_IRQ+0x52/0xe0
[60775.671960]  [<ffffffff81510bed>] ? common_interrupt+0x6d/0x6d
[60775.677777]  <EOI>  [<ffffffff8108ab8d>] ? __hrtimer_start_range_ns+0x1cd/0x390
[60775.685086]  [<ffffffff813dcdbf>] ? cpuidle_enter_state+0x4f/0xc0
[60775.691163]  [<ffffffff813dcdb8>] ? cpuidle_enter_state+0x48/0xc0
[60775.697240]  [<ffffffff810a7d58>] ? cpu_startup_entry+0x2f8/0x400
[60775.703317]  [<ffffffff81902071>] ? start_kernel+0x492/0x49d
[60775.708960]  [<ffffffff81901a04>] ? set_init_arg+0x4e/0x4e
[60775.714430]  [<ffffffff81901120>] ? early_idt_handlers+0x120/0x120
[60775.720594]  [<ffffffff8190171f>] ? x86_64_start_kernel+0x14d/0x15c
[60775.726843] ---[ end trace c397b1dd1ab3d5ce ]---
BBlack added a comment.Apr 3 2015, 1:00 PM

Try with the 3.19 kernel in case that makes the problem go away? That's the kernel the caches run anyways, and is in our repo: apt-get install linux-image-3.19

I took a look at kernel logs for v3.16..v3.19. There are of course many changes that touch your trace codepaths in some way and could intentionally or inadvertently fix this, but the two that stood out as most relevant/likely on the surface are:

commit 2dc49d1680b534877fd20cce52557ea542bb06b6
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Mon Dec 22 18:22:48 2014 +0100

    tcp6: don't move IP6CB before xfrm6_policy_check()

    When xfrm6_policy_check() is used, _decode_session6() is called after some
    intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
    prepared after the call of xfrm6_policy_check().

    Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.

    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Reported-by: Huaibin Wang <huaibin.wang@6wind.com>
    Suggested-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f293a5e33e084ef84b802e3002e5fe3eef086171
Author: dingzhi <zhi.ding@6wind.com>
Date:   Thu Oct 30 09:39:36 2014 +0100

    xfrm: add XFRMA_REPLAY_VAL attribute to SA messages

    After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay
    value is defined. Thus sequence number values are always notified to userspace.

    Signed-off-by: dingzhi <zhi.ding@6wind.com>
    Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
faidon added a comment.Apr 6 2015, 5:36 AM

Well, I obviously agree with @BBlack that we should test with 3.19 (and, in general, have a test environment that resembles production as much as possible — the trusty/jessie disparity revealed another severe issue as well).

That said, it'd be best to better understand this crash because it might just be harder but not impossible to reproduce in 3.19. Moreover, we know we want to enable IPsec for other hosts as well in the future (e.g. Kafka brokers) and it would be best to not add a 3.19 dependency for that.

So, we should have as a -secondary- priority to find a fix or a workaround for this, e.g. the commit that fixed this, so that it can go to upstream's stable branch and/or e.g. avoiding ESN.

@Gage, was the HTTP transfer you did over IPv6 (as to match @BBlack's first commit candidate?)? Could you list exactly the steps to reproduce?

Andrew triaged this task as High priority.Apr 6 2015, 4:35 PM

Thanks for the feedback. Steps to reproduce are in the task description, I used IPv4:

while true ; do wget -nv -O /dev/null http://10.64.0.170/index.nginx-debian.html ; sleep 1 ; done

With the following ciphers in /etc/ipsec.conf:

ike=aes128gcm16-null-prfsha384-ecp384bp!
esp=aes128gcm16-null-ecp384bp-esn!

I make sure to shut down the strongswan service on both hosts before starting it again, to avoid any mismatched cipher problems.

I've just tried upgrading both machines to 3.19. On the first attempt, both machines crashed after just 9 requests.

berkelium (http client):

[  854.482796] general protection fault: 0000 [#1] SMP
[  854.487820] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp mrp stp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 deflate ctr twofish_ge2
[  854.595325] CPU: 2 PID: 1968 Comm: bash Not tainted 3.19.0-trunk-amd64 #1 Debian 3.19.1-1~exp1
[  854.603918] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  854.611470] task: ffff88080b3e0c20 ti: ffff880809484000 task.ti: ffff880809484000
[  854.618935] RIP: 0010:[<ffffffff81188d8e>]  [<ffffffff81188d8e>] anon_vma_clone+0x7e/0x1f0
[  854.627192] RSP: 0018:ffff880809487d60  EFLAGS: 00010282
[  854.632489] RAX: ffff880809f45680 RBX: ffff88080a58f748 RCX: 0000000000000000
[  854.639607] RDX: 0000000000000000 RSI: ffff88080f25b188 RDI: 0000000000000246
[  854.646723] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000001b188
[  854.653840] R10: 0000000000000003 R11: ffff88080a6c11c0 R12: ffff880809f1bf80
[  854.660957] R13: ffff880809f45680 R14: ffff88080a58f7c0 R15: 2a00000000000000
[  854.668075] FS:  00007faa4a6f7700(0000) GS:ffff88080f240000(0000) knlGS:0000000000000000
[  854.676146] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  854.681878] CR2: 0000000000f2ce90 CR3: 00000008075ce000 CR4: 00000000000007e0
[  854.689016] Stack:
[  854.691021]  ffff8808075ce800 ffff880809e23cc8 ffff880809e23c50 0000000000000286
[  854.698431]  00007faa4a092fff ffff88080a58f748 ffff880809e23c50 ffff880809e23c50
[  854.705843]  0000000000000004 00007faa4a6f79d0 0000000000000000 ffffffff81188f2d
[  854.713254] Call Trace:
[  854.715693]  [<ffffffff81188f2d>] ? anon_vma_fork+0x2d/0x140
[  854.721342]  [<ffffffff8106b8bb>] ? copy_process.part.27+0x159b/0x1ae0
[  854.727857]  [<ffffffff811c0410>] ? get_empty_filp+0xd0/0x1c0
[  854.733590]  [<ffffffff8106bfd0>] ? do_fork+0xe0/0x3d0
[  854.738718]  [<ffffffff811dc32c>] ? __alloc_fd+0x7c/0x120
[  854.744106]  [<ffffffff815507b9>] ? stub_clone+0x69/0x90
[  854.749405]  [<ffffffff8155046d>] ? system_call_fast_compare_end+0xc/0x11
[  854.756178] Code: 8d 60 f0 0f 84 b4 00 00 00 48 8b 3d f5 05 95 00 be 00 02 00 00 e8 e3 d1 01 00 48 85 c0 49 89 c5 0f 84 c7 00 00 00 4d 8b 7c 24 08 <49> 8b 17 48 39 ea 74 15 48 85 ed 0f 85 f1 00 00 00
[  854.775557] RIP  [<ffffffff81188d8e>] anon_vma_clone+0x7e/0x1f0
[  854.781473]  RSP <ffff880809487d60>
[  854.785308] ---[ end trace 6e6a0aa2cef5345a ]---

curium (http server):

[  854.482796] general protection fault: 0000 [#1] SMP
[  854.487820] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp mrp stp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 deflate ctr twofish_ge2
[  854.595325] CPU: 2 PID: 1968 Comm: bash Not tainted 3.19.0-trunk-amd64 #1 Debian 3.19.1-1~exp1
[  854.603918] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  854.611470] task: ffff88080b3e0c20 ti: ffff880809484000 task.ti: ffff880809484000
[  854.618935] RIP: 0010:[<ffffffff81188d8e>]  [<ffffffff81188d8e>] anon_vma_clone+0x7e/0x1f0
[  854.627192] RSP: 0018:ffff880809487d60  EFLAGS: 00010282
[  854.632489] RAX: ffff880809f45680 RBX: ffff88080a58f748 RCX: 0000000000000000
[  854.639607] RDX: 0000000000000000 RSI: ffff88080f25b188 RDI: 0000000000000246
[  854.646723] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000001b188
[  854.653840] R10: 0000000000000003 R11: ffff88080a6c11c0 R12: ffff880809f1bf80
[  854.660957] R13: ffff880809f45680 R14: ffff88080a58f7c0 R15: 2a00000000000000
[  854.668075] FS:  00007faa4a6f7700(0000) GS:ffff88080f240000(0000) knlGS:0000000000000000
[  854.676146] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  854.681878] CR2: 0000000000f2ce90 CR3: 00000008075ce000 CR4: 00000000000007e0
[  854.689016] Stack:
[  854.691021]  ffff8808075ce800 ffff880809e23cc8 ffff880809e23c50 0000000000000286
[  854.698431]  00007faa4a092fff ffff88080a58f748 ffff880809e23c50 ffff880809e23c50
[  854.705843]  0000000000000004 00007faa4a6f79d0 0000000000000000 ffffffff81188f2d
[  854.713254] Call Trace:
[  854.715693]  [<ffffffff81188f2d>] ? anon_vma_fork+0x2d/0x140
[  854.721342]  [<ffffffff8106b8bb>] ? copy_process.part.27+0x159b/0x1ae0
[  854.727857]  [<ffffffff811c0410>] ? get_empty_filp+0xd0/0x1c0
[  854.733590]  [<ffffffff8106bfd0>] ? do_fork+0xe0/0x3d0
[  854.738718]  [<ffffffff811dc32c>] ? __alloc_fd+0x7c/0x120
[  854.744106]  [<ffffffff815507b9>] ? stub_clone+0x69/0x90
[  854.749405]  [<ffffffff8155046d>] ? system_call_fast_compare_end+0xc/0x11
[  854.756178] Code: 8d 60 f0 0f 84 b4 00 00 00 48 8b 3d f5 05 95 00 be 00 02 00 00 e8 e3 d1 01 00 48 85 c0 49 89 c5 0f 84 c7 00 00 00 4d 8b 7c 24 08 <49> 8b 17 48 39 ea 74 15 48 85 ed 0f 85 f1 00 00 00
[  854.775557] RIP  [<ffffffff81188d8e>] anon_vma_clone+0x7e/0x1f0
[  854.781473]  RSP <ffff880809487d60>
[  854.785308] ---[ end trace 6e6a0aa2cef5345a ]---
[  354.527909] WARNING: CPU: 1 PID: 0 at /build/linux-YAn3Sd/linux-3.19.1/arch/x86/kernel/smp.c:124 update_process_times+0x4e/0x60()
[  354.539533] Modules linked in: binfmt_misc esp6 xfrm6_mode_transport seqiv xfrm4_mode_transport 8021q garp mrp stp llc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 deflate ctr twofish_ge2
[  354.647006] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D        3.19.0-trunk-amd64 #1 Debian 3.19.1-1~exp1
[  354.656898] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  354.664448]  0000000000000000 ffffffff81717478 ffffffff8154a9c1 0000000000000000
[  354.671857]  ffffffff8106ced1 ffff88080bb129e0 0000000000000000 ffff88080f223b78
[  354.679265]  ffff88080f22e6c0 ffff88080f223b38 ffffffff810cc82e 000000527ebc51b8
[  354.686674] Call Trace:
[  354.689111]  <IRQ>  [<ffffffff8154a9c1>] ? dump_stack+0x40/0x50
[  354.695033]  [<ffffffff8106ced1>] ? warn_slowpath_common+0x81/0xb0
[  354.701197]  [<ffffffff810cc82e>] ? update_process_times+0x4e/0x60
[  354.707362]  [<ffffffff810db494>] ? tick_sched_handle.isra.16+0x24/0x60
[  354.713960]  [<ffffffff810db50b>] ? tick_sched_timer+0x3b/0x70
[  354.719778]  [<ffffffff810cd42b>] ? __run_hrtimer+0x6b/0x1c0
[  354.725424]  [<ffffffff8101cc45>] ? read_tsc+0x5/0x10
[  354.730461]  [<ffffffff810cd839>] ? hrtimer_interrupt+0xf9/0x230
[  354.736452]  [<ffffffff810bab32>] ? wake_up_klogd+0x32/0x50
[  354.742009]  [<ffffffff8150b4d0>] ? fib6_flush_trees+0x50/0x50
[  354.747828]  [<ffffffff81553419>] ? smp_apic_timer_interrupt+0x39/0x50
[  354.754338]  [<ffffffff815514fd>] ? apic_timer_interrupt+0x6d/0x80
[  354.760502]  [<ffffffff8154963e>] ? panic+0x1b9/0x1fb
[  354.765539]  [<ffffffff8154963a>] ? panic+0x1b5/0x1fb
[  354.770578]  [<ffffffff81017691>] ? oops_end+0xd1/0xe0
[  354.775701]  [<ffffffff81552548>] ? general_protection+0x28/0x30
[  354.781692]  [<ffffffff8150b4d0>] ? fib6_flush_trees+0x50/0x50
[  354.787511]  [<ffffffff8150b210>] ? fib6_walk_continue+0x120/0x1d0
[  354.793675]  [<ffffffff8150b28e>] ? fib6_walk_continue+0x19e/0x1d0
[  354.799840]  [<ffffffff8150b369>] ? fib6_walk+0x59/0x80
[  354.805051]  [<ffffffff8150b3d9>] ? fib6_clean_tree+0x49/0x50
[  354.810780]  [<ffffffff8150d900>] ? fib6_del+0x2c0/0x2c0
[  354.816079]  [<ffffffff81096263>] ? try_to_wake_up+0xd3/0x330
[  354.821810]  [<ffffffff8150b4d0>] ? fib6_flush_trees+0x50/0x50
[  354.827629]  [<ffffffff8150b44b>] ? __fib6_clean_all+0x6b/0xa0
[  354.833446]  [<ffffffff8150db60>] ? fib6_run_gc+0xf0/0xf0
[  354.838829]  [<ffffffff8150dabe>] ? fib6_run_gc+0x4e/0xf0
[  354.844213]  [<ffffffff81082d30>] ? __queue_work+0x340/0x340
[  354.849858]  [<ffffffff810ca7b0>] ? call_timer_fn+0x30/0x100
[  354.855502]  [<ffffffff8150db60>] ? fib6_run_gc+0xf0/0xf0
[  354.860885]  [<ffffffff810cc499>] ? run_timer_softirq+0x209/0x2f0
[  354.866962]  [<ffffffff810711da>] ? __do_softirq+0x11a/0x290
[  354.872606]  [<ffffffff810714b5>] ? irq_exit+0x95/0xa0
[  354.877729]  [<ffffffff8155341e>] ? smp_apic_timer_interrupt+0x3e/0x50
[  354.884240]  [<ffffffff815514fd>] ? apic_timer_interrupt+0x6d/0x80
[  354.890403]  <EOI>  [<ffffffff81047b4d>] ? lapic_next_event+0x1d/0x30
[  354.896841]  [<ffffffff814232de>] ? cpuidle_enter_state+0x5e/0x160
[  354.903005]  [<ffffffff814232ce>] ? cpuidle_enter_state+0x4e/0x160
[  354.909169]  [<ffffffff810a928d>] ? cpu_startup_entry+0x34d/0x3f0
[  354.915248]  [<ffffffff810d9be0>] ? tick_check_new_device+0xe0/0x110
[  354.921584]  [<ffffffff8104645e>] ? start_secondary+0x19e/0x1d0
[  354.927489] ---[ end trace e4c1de7f9d27efb7 ]---

On the second attempt, berkelium segfaulted and then crashed after 8 requests:

[  144.140349] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 2.2.10 11/09/2010
[  144.147900] task: ffffffff8181a540 ti: ffffffff81800000 task.ti: ffffffff81800000
[  144.155364] RIP: 0010:[<ffffffff8150b210>]  [<ffffffff8150b210>] fib6_walk_continue+0x120/0x1d0
[  144.164058] RSP: 0018:ffff88080f203d78  EFLAGS: 00010297
[  144.169355] RAX: 1200000000000000 RBX: ffff88080f203db0 RCX: 0000000000000000
[  144.176472] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffff88080f203db0
[  144.183591] RBP: ffffffff818c9d80 R08: 0000000000000000 R09: 0000000000000000
[  144.190708] R10: 0000000000000000 R11: 00000000b9000000 R12: 0000000000000000
[  144.197825] R13: 0000000000000000 R14: ffffffff8150b4d0 R15: ffff8808079872d4
[  144.204945] FS:  0000000000000000(0000) GS:ffff88080f200000(0000) knlGS:0000000000000000
[  144.213018] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  144.218749] CR2: 00007f0d54ae8148 CR3: 0000000001813000 CR4: 00000000000007f0
[  144.225867] Stack:
[  144.227871]  ffff88080f203db0 ffffffff8150b369 ffffffff8101d166 000000000f214140
[  144.235284]  ffff8808079872c0 ffffffff8150b3d9 ffff88080f214140 ffffffff818d4100
[  144.242697]  ffffffff818d4100 ffff8808079872e0 1200000000000000 0000000000000000
[  144.250108] Call Trace:
[  144.252545]  <IRQ>
[  144.254462]  [<ffffffff8150b369>] ? fib6_walk+0x59/0x80
[  144.259873]  [<ffffffff8101d166>] ? native_sched_clock+0x26/0x90
[  144.265865]  [<ffffffff8150b3d9>] ? fib6_clean_tree+0x49/0x50
[  144.271597]  [<ffffffff8150d900>] ? fib6_del+0x2c0/0x2c0
[  144.276899]  [<ffffffff81096263>] ? try_to_wake_up+0xd3/0x330
[  144.282632]  [<ffffffff8150b4d0>] ? fib6_flush_trees+0x50/0x50
[  144.288470]  [<ffffffff8150b44b>] ? __fib6_clean_all+0x6b/0xa0
[  144.294289]  [<ffffffff8150db60>] ? fib6_run_gc+0xf0/0xf0
[  144.299674]  [<ffffffff8150dabe>] ? fib6_run_gc+0x4e/0xf0
[  144.305059]  [<ffffffff81082d30>] ? __queue_work+0x340/0x340
[  144.310705]  [<ffffffff810ca7b0>] ? call_timer_fn+0x30/0x100
[  144.316351]  [<ffffffff8150db60>] ? fib6_run_gc+0xf0/0xf0
[  144.321738]  [<ffffffff810cc499>] ? run_timer_softirq+0x209/0x2f0
[  144.327819]  [<ffffffff810711da>] ? __do_softirq+0x11a/0x290
[  144.333463]  [<ffffffff810714b5>] ? irq_exit+0x95/0xa0
[  144.338590]  [<ffffffff8155341e>] ? smp_apic_timer_interrupt+0x3e/0x50
[  144.345102]  [<ffffffff815514fd>] ? apic_timer_interrupt+0x6d/0x80
[  144.351265]  <EOI>
[  144.353182]  [<ffffffff814232de>] ? cpuidle_enter_state+0x5e/0x160
[  144.359542]  [<ffffffff814232ce>] ? cpuidle_enter_state+0x4e/0x160
[  144.365711]  [<ffffffff810a928d>] ? cpu_startup_entry+0x34d/0x3f0
[  144.371790]  [<ffffffff81918f64>] ? start_kernel+0x476/0x481
[  144.377435]  [<ffffffff81918120>] ? early_idt_handlers+0x120/0x120
[  144.383602]  [<ffffffff81918120>] ? early_idt_handlers+0x120/0x120
[  144.389769]  [<ffffffff8191860d>] ? x86_64_start_kernel+0x150/0x15f
[  144.396019] Code: 85 d2 75 10 c7 43 28 02 00 00 00 48 8b 50 10 48 85 d2 74 77 48 89 53 18 c7 43 28 00 00 00 00 48 89 d0 e9 f4 fe ff ff 0f 1f 40 00 <48> 8b 50 18 48 85 d2 74 bf 48 89 53 18 48 89 d0 e9
[  144.415472] RIP  [<ffffffff8150b210>] fib6_walk_continue+0x120/0x1d0
[  144.421821]  RSP <ffff88080f203d78>
[  144.425309] ---[ end trace d60273003aab6bfc ]---
[  144.433648] Kernel panic - not syncing: Fatal exception in interrupt
[  144.440148] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  144.454047] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Over the weekend I did stress testing with kernel 3.16 and ESN disabled: I tried the following:

while true ; do wget -nv -O /dev/null http://10.64.0.170/index.nginx-debian.html ; sleep .1 ; done
while true ; do wget -O /dev/null --limit-rate=40M  http://10.64.0.170/ubuntu-14.04.2-server-amd64.iso ; sleep 1 ; done
siege -c 1025 http://10.64.0.170/index.nginx-debian.html

(Get tiny file with 0.1s sleep, get large file at 40MB/s, get tiny file with 1025 clients in parallel.)

In total I ran tests for >48 hours without experiencing any crashes, and the only change was disabling ESN.

Initial results from the same testing using 3.19 yield the same result: quick crash with ESN enabled, no crash with ESN disabled.

I've searched LKML archives but didn't find any discussion of problems with ESN in the past year.

ESN has never been a hard requirement for us, it just sounded like a good idea: "This option permits transmission of very large volumes of data at high speeds over an IPsec Security Association, without rekeying to avoid sequence number space exhaustion." (https://tools.ietf.org/html/rfc4304)

It seems like we can work around any need for this by simply re-keying connections with sufficient frequency.

One good thing from this week's stress testing: I lowered the re-key time from default 2 hours to 4 minutes to approximate the CPU load of re-keying many connections, but even under these conditions the CPU load from using a 384-bit key for ECDH was extremely low: 99% idle & loadavg 0.04-0.11 even at 40MB/s (340Mb/s). So it seems unlikely that we'll need to lower the key size due to CPU load.

Gage closed this task as Resolved.Apr 21 2015, 10:21 PM
Gage claimed this task.

This seems to be fixed in linux-image-3.19.0-trunk-amd64 version 3.19.3-1~exp1, currently in Debian/Experimental.

  • 3.16.7-ckt7-1 with ESN enabled: kernel panics within 10 seconds
  • 3.19.1-1~exp1 with ESN enabled: kernel panics within 10 seconds
  • 3.19.3-1~exp1 with ESN enabled: kernels remained stable for >11,000 pings