Page MenuHomePhabricator

kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756 for analytics1044 and analytics1043
Closed, ResolvedPublic

Description

analytics1044 and analytics1043 got fried last at Sat Aug 1 19:33:40 2015 and Sun Aug 2 12:30:15 2015 respectivelly. Either hardware or kernel issue, as while some processes could continue (like SSH), others like ps, top got locked infinitelly, and shutdown didn't work.

dmesg logs show very relevant entries on both servers:

1analytics1043
2=============
3
4[910456.738494] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
5[910456.745883] invalid opcode: 0000 [#17] SMP
6[910456.750675] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel acpi_power_meter aesni_intel aes_x86_64 ipmi_devintf mei_me lrw mei gf128mul glue_helper ablk_helper dcdbas lpc_ich wmi cryptd ipmi_si shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
7[910456.785161] CPU: 7 PID: 146567 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
8[910456.794783] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
9[910456.803533] task: ffff88090af32fe0 ti: ffff88090af2e000 task.ti: ffff88090af2e000
10[910456.811982] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
11[910456.821526] RSP: 0018:ffff88090af2fd98 EFLAGS: 00010246
12[910456.827556] RAX: 0000000000000100 RBX: 00000007f027db88 RCX: ffff88090af2fb18
13[910456.835625] RDX: ffff88090af32fe0 RSI: 0000000000000000 RDI: 8000000bb72009e6
14[910456.843693] RBP: ffff88090af2fe20 R08: 0000000000000000 R09: 00000000000000a9
15[910456.851762] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880811272c08
16[910456.859830] R13: ffff880820dcf380 R14: ffff88105031a300 R15: 0000000000000080
17[910456.867893] FS: 00007fa1d4dfd700(0000) GS:ffff88105ec60000(0000) knlGS:0000000000000000
18[910456.877030] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
19[910456.883536] CR2: 000000072a562168 CR3: 000000096a93f000 CR4: 00000000001407e0
20[910456.891606] Stack:
21[910456.893948] 0000000000000001 ffff88090af2fdb0 ffffffff8109a780 ffff88090af2fdd0
22[910456.902356] ffffffff810d7ad6 0000000000000001 ffffffff81f1e978 ffff88090af2fe78
23[910456.910759] ffffffff810d983d ffff88090af2fe48 00000000000000a9 00000001ffffffff
24[910456.919158] Call Trace:
25[910456.922000] [<ffffffff8109a780>] ? wake_up_state+0x10/0x20
26[910456.928325] [<ffffffff810d7ad6>] ? wake_futex+0x66/0x90
27[910456.934362] [<ffffffff810d983d>] ? futex_wake_op+0x4ed/0x620
28[910456.940881] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
29[910456.947403] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
30[910456.954406] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
31[910456.961117] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
32[910456.967926] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
33[910456.974058] [<ffffffff8171e288>] page_fault+0x28/0x30
34[910456.979894] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
35[910457.001783] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
36[910457.008704] RSP <ffff88090af2fd98>
37[910457.013428] ---[ end trace 5564632be8836958 ]---
38
39[Sun Aug 2 12:30:15 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
40[Sun Aug 2 12:30:15 2015] invalid opcode: 0000 [#17] SMP
41[Sun Aug 2 12:30:15 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel acpi_power_meter aesni_intel aes_x86_64 ipmi_devintf mei_me lrw mei gf128mul glue_helper ablk_helper dcdbas lpc_ich wmi cryptd ipmi_si shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
42[Sun Aug 2 12:30:15 2015] CPU: 7 PID: 146567 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
43[Sun Aug 2 12:30:15 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
44[Sun Aug 2 12:30:15 2015] task: ffff88090af32fe0 ti: ffff88090af2e000 task.ti: ffff88090af2e000
45[Sun Aug 2 12:30:15 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
46[Sun Aug 2 12:30:15 2015] RSP: 0018:ffff88090af2fd98 EFLAGS: 00010246
47[Sun Aug 2 12:30:15 2015] RAX: 0000000000000100 RBX: 00000007f027db88 RCX: ffff88090af2fb18
48[Sun Aug 2 12:30:15 2015] RDX: ffff88090af32fe0 RSI: 0000000000000000 RDI: 8000000bb72009e6
49[Sun Aug 2 12:30:15 2015] RBP: ffff88090af2fe20 R08: 0000000000000000 R09: 00000000000000a9
50[Sun Aug 2 12:30:15 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880811272c08
51[Sun Aug 2 12:30:15 2015] R13: ffff880820dcf380 R14: ffff88105031a300 R15: 0000000000000080
52[Sun Aug 2 12:30:15 2015] FS: 00007fa1d4dfd700(0000) GS:ffff88105ec60000(0000) knlGS:0000000000000000
53[Sun Aug 2 12:30:15 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
54[Sun Aug 2 12:30:15 2015] CR2: 000000072a562168 CR3: 000000096a93f000 CR4: 00000000001407e0
55[Sun Aug 2 12:30:15 2015] Stack:
56[Sun Aug 2 12:30:15 2015] 0000000000000001 ffff88090af2fdb0 ffffffff8109a780 ffff88090af2fdd0
57[Sun Aug 2 12:30:15 2015] ffffffff810d7ad6 0000000000000001 ffffffff81f1e978 ffff88090af2fe78
58[Sun Aug 2 12:30:15 2015] ffffffff810d983d ffff88090af2fe48 00000000000000a9 00000001ffffffff
59[Sun Aug 2 12:30:15 2015] Call Trace:
60[Sun Aug 2 12:30:15 2015] [<ffffffff8109a780>] ? wake_up_state+0x10/0x20
61[Sun Aug 2 12:30:15 2015] [<ffffffff810d7ad6>] ? wake_futex+0x66/0x90
62[Sun Aug 2 12:30:15 2015] [<ffffffff810d983d>] ? futex_wake_op+0x4ed/0x620
63[Sun Aug 2 12:30:15 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
64[Sun Aug 2 12:30:15 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
65[Sun Aug 2 12:30:15 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
66[Sun Aug 2 12:30:15 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
67[Sun Aug 2 12:30:15 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
68[Sun Aug 2 12:30:15 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
69[Sun Aug 2 12:30:15 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
70[Sun Aug 2 12:30:16 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
71[Sun Aug 2 12:30:16 2015] RSP <ffff88090af2fd98>
72[Sun Aug 2 12:30:16 2015] ---[ end trace 5564632be8836958 ]---
73
74
75analytics1044
76=============
77
78[Sat Aug 1 19:33:40 2015] ------------[ cut here ]------------
79[Sat Aug 1 19:33:40 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
80[Sat Aug 1 19:33:40 2015] invalid opcode: 0000 [#1] SMP
81[Sat Aug 1 19:33:40 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
82[Sat Aug 1 19:33:40 2015] CPU: 6 PID: 14579 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu
83[Sat Aug 1 19:33:40 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
84[Sat Aug 1 19:33:40 2015] task: ffff8801f9d817f0 ti: ffff8801e7154000 task.ti: ffff8801e7154000
85[Sat Aug 1 19:33:40 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
86[Sat Aug 1 19:33:40 2015] RSP: 0018:ffff8801e7155d98 EFLAGS: 00010246
87[Sat Aug 1 19:33:40 2015] RAX: 0000000000000100 RBX: 000000078520fd48 RCX: ffff8801e7155b18
88[Sat Aug 1 19:33:40 2015] RDX: ffff8801f9d817f0 RSI: 0000000000000000 RDI: 80000001ebc009e6
89[Sat Aug 1 19:33:40 2015] RBP: ffff8801e7155e20 R08: 0000000000000000 R09: 00000000000000a9
90[Sat Aug 1 19:33:40 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976148
91[Sat Aug 1 19:33:40 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
92[Sat Aug 1 19:33:40 2015] FS: 00007fab551bb700(0000) GS:ffff88085f460000(0000) knlGS:0000000000000000
93[Sat Aug 1 19:33:40 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
94[Sat Aug 1 19:33:40 2015] CR2: 00000007fe800000 CR3: 000000012f1da000 CR4: 00000000001407e0
95[Sat Aug 1 19:33:40 2015] Stack:
96[Sat Aug 1 19:33:40 2015] ffff8801e7155e20 ffff880036058000 ffff8801e7155f20 0000000000000001
97[Sat Aug 1 19:33:40 2015] 00007fab4d216950 0000000000000001 000000000000000a 0000000000000001
98[Sat Aug 1 19:33:40 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000004
99[Sat Aug 1 19:33:40 2015] Call Trace:
100[Sat Aug 1 19:33:40 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
101[Sat Aug 1 19:33:40 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
102[Sat Aug 1 19:33:40 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
103[Sat Aug 1 19:33:40 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
104[Sat Aug 1 19:33:40 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
105[Sat Aug 1 19:33:40 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
106[Sat Aug 1 19:33:40 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
107[Sat Aug 1 19:33:40 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
108[Sat Aug 1 19:33:40 2015] RSP <ffff8801e7155d98>
109[Sat Aug 1 19:33:41 2015] ------------[ cut here ]------------
110[Sat Aug 1 19:33:41 2015] ---[ end trace b7e84915c2f4b0c8 ]---
111[Sat Aug 1 19:33:41 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
112[Sat Aug 1 19:33:41 2015] invalid opcode: 0000 [#2] SMP
113[Sat Aug 1 19:33:41 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
114[Sat Aug 1 19:33:41 2015] CPU: 20 PID: 14573 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
115[Sat Aug 1 19:33:41 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
116[Sat Aug 1 19:33:41 2015] task: ffff8801fa9a5fc0 ti: ffff88010eb8e000 task.ti: ffff88010eb8e000
117[Sat Aug 1 19:33:41 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
118[Sat Aug 1 19:33:41 2015] RSP: 0018:ffff88010eb8fd98 EFLAGS: 00010246
119[Sat Aug 1 19:33:41 2015] RAX: 0000000000000100 RBX: 0000000785207d30 RCX: ffff88010eb8fb18
120[Sat Aug 1 19:33:41 2015] RDX: ffff8801fa9a5fc0 RSI: 0000000000000000 RDI: 80000001ebc009e6
121[Sat Aug 1 19:33:41 2015] RBP: ffff88010eb8fe20 R08: 0000000000000000 R09: 00000000000000a9
122[Sat Aug 1 19:33:41 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976148
123[Sat Aug 1 19:33:41 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
124[Sat Aug 1 19:33:41 2015] FS: 00007fab557c1700(0000) GS:ffff88085f540000(0000) knlGS:0000000000000000
125[Sat Aug 1 19:33:41 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
126[Sat Aug 1 19:33:41 2015] CR2: 0000000785207d30 CR3: 000000012f1da000 CR4: 00000000001407e0
127[Sat Aug 1 19:33:41 2015] Stack:
128[Sat Aug 1 19:33:41 2015] ffff88010eb8fe20 ffff88010eb8fdb0 ffff88010eb8ff20 0000000078dcd510
129[Sat Aug 1 19:33:41 2015] 0000000000000000 0000000078b411a8 0000000078b411b8 0000000078b411c8
130[Sat Aug 1 19:33:41 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000004
131[Sat Aug 1 19:33:41 2015] Call Trace:
132[Sat Aug 1 19:33:41 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
133[Sat Aug 1 19:33:41 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
134[Sat Aug 1 19:33:41 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
135[Sat Aug 1 19:33:41 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
136[Sat Aug 1 19:33:41 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
137[Sat Aug 1 19:33:41 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
138[Sat Aug 1 19:33:41 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
139[Sat Aug 1 19:33:41 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
140[Sat Aug 1 19:33:41 2015] RSP <ffff88010eb8fd98>
141[Sat Aug 1 19:33:41 2015] ------------[ cut here ]------------
142[Sat Aug 1 19:33:41 2015] ---[ end trace b7e84915c2f4b0c9 ]---
143[Sat Aug 1 19:33:41 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
144[Sat Aug 1 19:33:41 2015] invalid opcode: 0000 [#3] SMP
145[Sat Aug 1 19:33:41 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
146[Sat Aug 1 19:33:41 2015] CPU: 13 PID: 14584 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
147[Sat Aug 1 19:33:41 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
148[Sat Aug 1 19:33:41 2015] task: ffff8801e70e0000 ti: ffff880651942000 task.ti: ffff880651942000
149[Sat Aug 1 19:33:41 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
150[Sat Aug 1 19:33:41 2015] RSP: 0018:ffff880651943d98 EFLAGS: 00010246
151[Sat Aug 1 19:33:41 2015] RAX: 0000000000000100 RBX: 000000078500f718 RCX: ffff880651943b18
152[Sat Aug 1 19:33:41 2015] RDX: ffff8801e70e0000 RSI: 0000000000000000 RDI: 80000001eb0009e6
153[Sat Aug 1 19:33:41 2015] RBP: ffff880651943e20 R08: 0000000000000000 R09: 00000000000000a9
154[Sat Aug 1 19:33:41 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976140
155[Sat Aug 1 19:33:41 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
156[Sat Aug 1 19:33:41 2015] FS: 00007fab54cb6700(0000) GS:ffff88105ecc0000(0000) knlGS:0000000000000000
157[Sat Aug 1 19:33:41 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
158[Sat Aug 1 19:33:41 2015] CR2: 00007fdbd44e8c50 CR3: 000000012f1da000 CR4: 00000000001407e0
159[Sat Aug 1 19:33:41 2015] Stack:
160[Sat Aug 1 19:33:41 2015] 0000000000000001 ffff880651943db0 ffff880651943f20 ffff880651943dd0
161[Sat Aug 1 19:33:41 2015] 0000000000000283 ffffffffc14c9620 ffffffffc14c9630 ffffffffc14c9640
162[Sat Aug 1 19:33:41 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000006
163[Sat Aug 1 19:33:41 2015] Call Trace:
164[Sat Aug 1 19:33:41 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
165[Sat Aug 1 19:33:41 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
166[Sat Aug 1 19:33:41 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
167[Sat Aug 1 19:33:41 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
168[Sat Aug 1 19:33:41 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
169[Sat Aug 1 19:33:41 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
170[Sat Aug 1 19:33:41 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
171[Sat Aug 1 19:33:41 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
172[Sat Aug 1 19:33:41 2015] RSP <ffff880651943d98>
173[Sat Aug 1 19:33:41 2015] ------------[ cut here ]------------
174[Sat Aug 1 19:33:41 2015] ---[ end trace b7e84915c2f4b0ca ]---
175[Sat Aug 1 19:33:41 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
176[Sat Aug 1 19:33:41 2015] invalid opcode: 0000 [#4] SMP
177[Sat Aug 1 19:33:41 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
178[Sat Aug 1 19:33:41 2015] CPU: 5 PID: 14586 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
179[Sat Aug 1 19:33:41 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
180[Sat Aug 1 19:33:41 2015] task: ffff8804e3778000 ti: ffff8805d51f6000 task.ti: ffff8805d51f6000
181[Sat Aug 1 19:33:41 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
182[Sat Aug 1 19:33:41 2015] RSP: 0018:ffff8805d51f7d98 EFLAGS: 00010246
183[Sat Aug 1 19:33:41 2015] RAX: 0000000000000100 RBX: 000000078520dd48 RCX: ffff8805d51f7b18
184[Sat Aug 1 19:33:41 2015] RDX: ffff8804e3778000 RSI: 0000000000000000 RDI: 80000001ebc009e6
185[Sat Aug 1 19:33:41 2015] RBP: ffff8805d51f7e20 R08: 0000000000000000 R09: 00000000000000a9
186[Sat Aug 1 19:33:41 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976148
187[Sat Aug 1 19:33:41 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
188[Sat Aug 1 19:33:41 2015] FS: 00007fab54ab4700(0000) GS:ffff88105ec40000(0000) knlGS:0000000000000000
189[Sat Aug 1 19:33:41 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
190[Sat Aug 1 19:33:41 2015] CR2: 00007f20a334ee78 CR3: 000000012f1da000 CR4: 00000000001407e0
191[Sat Aug 1 19:33:41 2015] Stack:
192[Sat Aug 1 19:33:41 2015] ffff8805d51f7e20 0000000000000000 8000000da592e966 0000000000000000
193[Sat Aug 1 19:33:41 2015] ffffea0036964b80 0000000000000000 ffff88084f649c60 0000000000000002
194[Sat Aug 1 19:33:41 2015] 000000000000000c 0000000000000001 ffff8800000000a9 ffffffffffffff03
195[Sat Aug 1 19:33:41 2015] Call Trace:
196[Sat Aug 1 19:33:41 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
197[Sat Aug 1 19:33:41 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
198[Sat Aug 1 19:33:41 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
199[Sat Aug 1 19:33:41 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
200[Sat Aug 1 19:33:41 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
201[Sat Aug 1 19:33:41 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
202[Sat Aug 1 19:33:41 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
203[Sat Aug 1 19:33:41 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
204[Sat Aug 1 19:33:41 2015] RSP <ffff8805d51f7d98>
205[Sat Aug 1 19:33:41 2015] ------------[ cut here ]------------
206[Sat Aug 1 19:33:41 2015] ---[ end trace b7e84915c2f4b0cb ]---
207[Sat Aug 1 19:33:42 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
208[Sat Aug 1 19:33:42 2015] invalid opcode: 0000 [#5] SMP
209[Sat Aug 1 19:33:42 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
210[Sat Aug 1 19:33:42 2015] CPU: 0 PID: 14585 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
211[Sat Aug 1 19:33:42 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
212[Sat Aug 1 19:33:42 2015] task: ffff8801e70e17f0 ti: ffff8805d51f4000 task.ti: ffff8805d51f4000
213[Sat Aug 1 19:33:42 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
214[Sat Aug 1 19:33:42 2015] RSP: 0018:ffff8805d51f5d98 EFLAGS: 00010246
215[Sat Aug 1 19:33:42 2015] RAX: 0000000000000100 RBX: 0000000785209d40 RCX: ffff8805d51f5b18
216[Sat Aug 1 19:33:42 2015] RDX: ffff8801e70e17f0 RSI: 0000000000000000 RDI: 80000001ebc009e6
217[Sat Aug 1 19:33:42 2015] RBP: ffff8805d51f5e20 R08: 0000000000000000 R09: 00000000000000a9
218[Sat Aug 1 19:33:42 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976148
219[Sat Aug 1 19:33:42 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
220[Sat Aug 1 19:33:42 2015] FS: 00007fab54bb5700(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000
221[Sat Aug 1 19:33:42 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
222[Sat Aug 1 19:33:42 2015] CR2: 0000000785209d40 CR3: 000000012f1da000 CR4: 00000000001407f0
223[Sat Aug 1 19:33:42 2015] Stack:
224[Sat Aug 1 19:33:42 2015] 0000000000000001 ffff8805d51f5db0 ffff8805d51f5f20 ffff8805d51f5dd0
225[Sat Aug 1 19:33:42 2015] 0000000000000283 0000000000008000 00007fab500298c8 0000000000000000
226[Sat Aug 1 19:33:42 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000006
227[Sat Aug 1 19:33:42 2015] Call Trace:
228[Sat Aug 1 19:33:42 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
229[Sat Aug 1 19:33:42 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
230[Sat Aug 1 19:33:42 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
231[Sat Aug 1 19:33:42 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
232[Sat Aug 1 19:33:42 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
233[Sat Aug 1 19:33:42 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
234[Sat Aug 1 19:33:42 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
235[Sat Aug 1 19:33:42 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
236[Sat Aug 1 19:33:42 2015] RSP <ffff8805d51f5d98>
237[Sat Aug 1 19:33:42 2015] ------------[ cut here ]------------
238[Sat Aug 1 19:33:42 2015] ---[ end trace b7e84915c2f4b0cc ]---
239[Sat Aug 1 19:33:42 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
240[Sat Aug 1 19:33:42 2015] invalid opcode: 0000 [#6] SMP
241[Sat Aug 1 19:33:42 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
242[Sat Aug 1 19:33:42 2015] CPU: 21 PID: 14571 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
243[Sat Aug 1 19:33:42 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
244[Sat Aug 1 19:33:42 2015] task: ffff8801fa9a2fe0 ti: ffff8801fb9c8000 task.ti: ffff8801fb9c8000
245[Sat Aug 1 19:33:42 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
246[Sat Aug 1 19:33:42 2015] RSP: 0018:ffff8801fb9c9d98 EFLAGS: 00010246
247[Sat Aug 1 19:33:42 2015] RAX: 0000000000000100 RBX: 0000000784a08590 RCX: ffff8801fb9c9b18
248[Sat Aug 1 19:33:42 2015] RDX: ffff8801fa9a2fe0 RSI: 0000000000000000 RDI: 80000001ea8009e6
249[Sat Aug 1 19:33:42 2015] RBP: ffff8801fb9c9e20 R08: 0000000000000000 R09: 00000000000000a9
250[Sat Aug 1 19:33:42 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976128
251[Sat Aug 1 19:33:42 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
252[Sat Aug 1 19:33:42 2015] FS: 00007fab559c3700(0000) GS:ffff88105ed40000(0000) knlGS:0000000000000000
253[Sat Aug 1 19:33:42 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
254[Sat Aug 1 19:33:42 2015] CR2: 00007fab59bf2100 CR3: 000000012f1da000 CR4: 00000000001407e0
255[Sat Aug 1 19:33:42 2015] Stack:
256[Sat Aug 1 19:33:42 2015] 0000000000000001 ffff8801fb9c9db0 ffff8801fb9c9f20 ffff8801fb9c9dd0
257[Sat Aug 1 19:33:42 2015] 000000000000003d 0000000078cf04d0 0000000078cf04e0 0000000078cf04f0
258[Sat Aug 1 19:33:42 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000004
259[Sat Aug 1 19:33:42 2015] Call Trace:
260[Sat Aug 1 19:33:42 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
261[Sat Aug 1 19:33:42 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
262[Sat Aug 1 19:33:42 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
263[Sat Aug 1 19:33:42 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
264[Sat Aug 1 19:33:42 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
265[Sat Aug 1 19:33:42 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
266[Sat Aug 1 19:33:42 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
267[Sat Aug 1 19:33:42 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
268[Sat Aug 1 19:33:42 2015] RSP <ffff8801fb9c9d98>
269[Sat Aug 1 19:33:42 2015] ------------[ cut here ]------------
270[Sat Aug 1 19:33:42 2015] ---[ end trace b7e84915c2f4b0cd ]---
271[Sat Aug 1 19:33:42 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
272[Sat Aug 1 19:33:42 2015] invalid opcode: 0000 [#7] SMP
273[Sat Aug 1 19:33:42 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
274[Sat Aug 1 19:33:42 2015] CPU: 19 PID: 14582 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
275[Sat Aug 1 19:33:42 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
276[Sat Aug 1 19:33:42 2015] task: ffff8801f9d85fc0 ti: ffff880106a0a000 task.ti: ffff880106a0a000
277[Sat Aug 1 19:33:42 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
278[Sat Aug 1 19:33:42 2015] RSP: 0018:ffff880106a0bd98 EFLAGS: 00010246
279[Sat Aug 1 19:33:42 2015] RAX: 0000000000000100 RBX: 000000078520bd40 RCX: ffff880106a0bb18
280[Sat Aug 1 19:33:42 2015] RDX: ffff8801f9d85fc0 RSI: 0000000000000000 RDI: 80000001ebc009e6
281[Sat Aug 1 19:33:42 2015] RBP: ffff880106a0be20 R08: 0000000000000000 R09: 00000000000000a9
282[Sat Aug 1 19:33:42 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976148
283[Sat Aug 1 19:33:42 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
284[Sat Aug 1 19:33:42 2015] FS: 00007fab54eb8700(0000) GS:ffff88105ed20000(0000) knlGS:0000000000000000
285[Sat Aug 1 19:33:42 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
286[Sat Aug 1 19:33:42 2015] CR2: 0000000777cdb000 CR3: 000000012f1da000 CR4: 00000000001407e0
287[Sat Aug 1 19:33:42 2015] Stack:
288[Sat Aug 1 19:33:42 2015] ffff880106a0be20 ffff880106a0bdb0 ffff880106a0bf20 0000000000000000
289[Sat Aug 1 19:33:42 2015] 0000000000000283 ffffffffcf31cc78 ffffffffcf31cc88 0000000000000001
290[Sat Aug 1 19:33:42 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000004
291[Sat Aug 1 19:33:42 2015] Call Trace:
292[Sat Aug 1 19:33:42 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
293[Sat Aug 1 19:33:42 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
294[Sat Aug 1 19:33:42 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
295[Sat Aug 1 19:33:42 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
296[Sat Aug 1 19:33:42 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
297[Sat Aug 1 19:33:42 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
298[Sat Aug 1 19:33:42 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
299[Sat Aug 1 19:33:42 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
300[Sat Aug 1 19:33:42 2015] RSP <ffff880106a0bd98>
301[Sat Aug 1 19:33:42 2015] ------------[ cut here ]------------
302[Sat Aug 1 19:33:42 2015] ---[ end trace b7e84915c2f4b0ce ]---
303[Sat Aug 1 19:33:42 2015] [sched_delayed] sched: RT throttling activated
304[Sat Aug 1 19:33:42 2015] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
305[Sat Aug 1 19:33:42 2015] invalid opcode: 0000 [#8] SMP
306[Sat Aug 1 19:33:42 2015] Modules linked in: 8021q garp stp mrp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_devintf lrw gf128mul glue_helper ablk_helper dcdbas cryptd acpi_power_meter wmi ipmi_si lpc_ich mei_me mei shpchp mac_hid lp parport tg3 ptp megaraid_sas pps_core
307[Sat Aug 1 19:33:42 2015] CPU: 9 PID: 14570 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
308[Sat Aug 1 19:33:42 2015] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 1.2.10 03/09/2015
309[Sat Aug 1 19:33:42 2015] task: ffff8801fa9a0000 ti: ffff88017dab8000 task.ti: ffff88017dab8000
310[Sat Aug 1 19:33:42 2015] RIP: 0010:[<ffffffff81179051>] [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
311[Sat Aug 1 19:33:42 2015] RSP: 0018:ffff88017dab9d98 EFLAGS: 00010246
312[Sat Aug 1 19:33:42 2015] RAX: 0000000000000100 RBX: 0000000784c67490 RCX: ffff88017dab9b18
313[Sat Aug 1 19:33:42 2015] RDX: ffff8801fa9a0000 RSI: 0000000000000000 RDI: 80000001254009e6
314[Sat Aug 1 19:33:42 2015] RBP: ffff88017dab9e20 R08: 0000000000000000 R09: 00000000000000a9
315[Sat Aug 1 19:33:42 2015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880186976130
316[Sat Aug 1 19:33:42 2015] R13: ffff8810514fcfc0 R14: ffff88084f649c00 R15: 0000000000000080
317[Sat Aug 1 19:33:42 2015] FS: 00007fab55ac4700(0000) GS:ffff88105ec80000(0000) knlGS:0000000000000000
318[Sat Aug 1 19:33:42 2015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
319[Sat Aug 1 19:33:42 2015] CR2: 00007f5029923710 CR3: 000000012f1da000 CR4: 00000000001407e0
320[Sat Aug 1 19:33:43 2015] Stack:
321[Sat Aug 1 19:33:43 2015] ffff8801fa9a2fe0 ffff88017dab9df0 ffff88017dab9f20 ffff88017dab9dd0
322[Sat Aug 1 19:33:43 2015] 0000000000000206 0000000000000001 0000000004000001 0000000000000000
323[Sat Aug 1 19:33:43 2015] 0000000080000000 0000000000000000 ffff8800000000a9 0000000000000004
324[Sat Aug 1 19:33:43 2015] Call Trace:
325[Sat Aug 1 19:33:43 2015] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
326[Sat Aug 1 19:33:43 2015] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
327[Sat Aug 1 19:33:43 2015] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
328[Sat Aug 1 19:33:43 2015] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
329[Sat Aug 1 19:33:43 2015] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
330[Sat Aug 1 19:33:43 2015] [<ffffffff8171e288>] page_fault+0x28/0x30
331[Sat Aug 1 19:33:43 2015] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
332[Sat Aug 1 19:33:43 2015] RIP [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
333[Sat Aug 1 19:33:43 2015] RSP <ffff88017dab9d98>
334[Sat Aug 1 19:33:43 2015] ---[ end trace b7e84915c2f4b0cf ]---

Servers were powercycled and the issue gone, so there are not a lot of direct actionable, but:

  • Reporting it so that if it is a hardware issue (faulty memory) it can be identified
  • Let's check kernel version to see if it requires a kernel upgrade due to a bug (same errors as T99594)

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added a project: acl*sre-team.
jcrespo added subscribers: jcrespo, MoritzMuehlenhoff.

OO, nasty. Googled, found these:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736
http://androidspanner.blogspot.com/2014/12/kernel-huge-page-issue-on-ubuntu-1404_23.html
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1323165

Bugfix has been backported into 3.13.0.61.68 which is available in Trusty. I'm upgrading the 8 newer nodes (1042-1049) now. We'll let these run for a while, and if all is well upgrade the other hadoop workers too.

Bugfix has been backported into Trusty. I'm upgrading the 8 newer nodes (1042-1049) now. We'll let these run for a while, and if all is well upgrade the other hadoop workers too.

@Ottomata: Has that happened?

No, but there is a backlogged task to audit these.

https://phabricator.wikimedia.org/T109834

I will close this one.