Page MenuHomePhabricator

cp4021 memory hardware issue - DIMM B1
Closed, ResolvedPublic

Description

cp4021 is new hardware. During first real production traffic loading, it reported a ton of correctable ECC errors in DIMM B1. Suggest replace the DIMM before we get an eventual uncorrectable?

[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:   physical_address: 0x0000005da1e8ae40
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54316 column: 184 
[Fri Sep  1 01:08:53 2017] {1}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: TSC e8999ebe8a92 
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: ADDR 5da1e8ae40 
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:08:53 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504228135 SOCKET 0 APIC 0
[Fri Sep  1 01:08:53 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5da1e8a offset:0xe40 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:00
9f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:13:50 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:   physical_address: 0x0000005e278f95c0
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54527 column: 592 
[Fri Sep  1 01:32:11 2017] {2}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: TSC eb651ec97b46 
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: ADDR 5e278f95c0 
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:32:11 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504229533 SOCKET 0 APIC 0
[Fri Sep  1 01:32:11 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e278f9 offset:0x5c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:   physical_address: 0x0000005e2c3bba80
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54677 column: 744 
[Fri Sep  1 01:32:16 2017] {3}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:32:16 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: TSC eb678c163b88 
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: ADDR 5e2c3bba80 
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:32:16 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504229538 SOCKET 0 APIC 0
[Fri Sep  1 01:32:16 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2c3bb offset:0xa80 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:   physical_address: 0x0000005e22d6bec0
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54363 column: 248 
[Fri Sep  1 01:32:59 2017] {4}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:32:59 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: TSC eb7d69566d84 
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: ADDR 5e22d6bec0 
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:32:59 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504229581 SOCKET 0 APIC 0
[Fri Sep  1 01:32:59 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e22d6b offset:0xec0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:34:35 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:   physical_address: 0x0000005e22d3e8c0
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54361 column: 928 
[Fri Sep  1 01:36:29 2017] {5}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: TSC ebe8e47fd62f 
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: ADDR 5e22d3e8c0 
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:36:29 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504229791 SOCKET 0 APIC 0
[Fri Sep  1 01:36:29 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e22d3e offset:0x8c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:37:19 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:   physical_address: 0x0000005e2c3bcc80
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54677 column: 816 
[Fri Sep  1 01:40:01 2017] {6}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: TSC ec5547ab36ae 
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: ADDR 5e2c3bcc80 
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:40:01 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504230002 SOCKET 0 APIC 0
[Fri Sep  1 01:40:01 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2c3bc offset:0xc80 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:41:09 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]: event severity: corrected
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:  fru_text: B1
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:   section_type: memory error
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:   physical_address: 0x0000005e22d3a5c0
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54361 column: 656 
[Fri Sep  1 01:46:10 2017] {7}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: TSC ed1205e33920 
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: ADDR 5e22d3a5c0 
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 01:46:10 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504230371 SOCKET 0 APIC 0
[Fri Sep  1 01:46:10 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e22d3a offset:0x5c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 01:47:42 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]: event severity: corrected
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:  fru_text: B1
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:   section_type: memory error
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:   physical_address: 0x0000005da5a3a5c0
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54441 column: 656 
[Fri Sep  1 02:53:42 2017] {8}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: TSC f52b251aeffe 
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: ADDR 5da5a3a5c0 
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 02:53:42 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504234423 SOCKET 0 APIC 0
[Fri Sep  1 02:53:42 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5da5a3a offset:0x5c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 02:57:36 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:   physical_address: 0x0000005e2078b900
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54292 column: 224 
[Fri Sep  1 03:13:20 2017] {9}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: TSC f785dfa93318 
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: ADDR 5e2078b900 
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:13:20 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504235601 SOCKET 0 APIC 0
[Fri Sep  1 03:13:20 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2078b offset:0x900 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:13:26 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:   physical_address: 0x0000005db2d7aec0
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54859 column: 696 
[Fri Sep  1 03:13:40 2017] {10}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: TSC f79076b5a160 
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: ADDR 5db2d7aec0 
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:13:40 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504235622 SOCKET 0 APIC 0
[Fri Sep  1 03:13:40 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5db2d7a offset:0xec0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:   physical_address: 0x0000005e2bc4ff40
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54650 column: 504 
[Fri Sep  1 03:14:46 2017] {11}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:14:46 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: TSC f7b22be53302 
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: ADDR 5e2bc4ff40 
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:14:46 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504235688 SOCKET 0 APIC 0
[Fri Sep  1 03:14:46 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2bc4f offset:0xf40 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:   physical_address: 0x0000005dacb99440
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54668 column: 592 
[Fri Sep  1 03:14:50 2017] {12}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:14:50 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: TSC f7b446af2fca 
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: ADDR 5dacb99440 
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:14:50 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504235692 SOCKET 0 APIC 0
[Fri Sep  1 03:14:50 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5dacb99 offset:0x440 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:15:54 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:   physical_address: 0x0000005daf8fe9c0
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54767 column: 928 
[Fri Sep  1 03:16:45 2017] {13}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: TSC f7ef08b4b724 
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: ADDR 5daf8fe9c0 
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:16:45 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504235807 SOCKET 0 APIC 0
[Fri Sep  1 03:16:45 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5daf8fe offset:0x9c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:18:21 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:   physical_address: 0x0000005da4b7e9c0
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54411 column: 928 
[Fri Sep  1 03:38:43 2017] {14}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: TSC fa9127b9e829 
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: ADDR 5da4b7e9c0 
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:38:43 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504237125 SOCKET 0 APIC 0
[Fri Sep  1 03:38:43 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5da4b7e offset:0x9c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:   physical_address: 0x0000005da69eafc0
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54479 column: 184 
[Fri Sep  1 03:38:55 2017] {15}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:38:55 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: TSC fa974d749ca1 
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: ADDR 5da69eafc0 
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:38:55 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504237137 SOCKET 0 APIC 0
[Fri Sep  1 03:38:55 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5da69ea offset:0xfc0 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:39:06 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:   physical_address: 0x0000005dab4de900
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54630 column: 928 
[Fri Sep  1 03:39:27 2017] {16}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: TSC faa7cbc38bc1 
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: ADDR 5dab4de900 
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:39:27 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504237169 SOCKET 0 APIC 0
[Fri Sep  1 03:39:27 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5dab4de offset:0x900 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]: event severity: corrected
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:  fru_text: B1
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:   section_type: memory error
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:   physical_address: 0x0000005e28fccc40
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54558 column: 304 
[Fri Sep  1 03:40:39 2017] {17}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 03:40:39 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: TSC faccb85cc132 
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: ADDR 5e28fccc40 
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 03:40:39 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504237241 SOCKET 0 APIC 0
[Fri Sep  1 03:40:39 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e28fcc offset:0xc40 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 03:41:50 2017] mce: [Hardware Error]: Machine check events logged
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]: It has been corrected by h/w and requires no further action
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]: event severity: corrected
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:  Error 0, type: corrected
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:  fru_text: B1
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:   section_type: memory error
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:   error_status: 0x0000000000000400
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:   physical_address: 0x0000005e2344cc00
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54386 column: 304 
[Fri Sep  1 16:28:24 2017] {18}[Hardware Error]:   error_type: 2, single-bit ECC
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: TSC 156dd49c84a6e 
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: ADDR 5e2344cc00 
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: MISC 0 
[Fri Sep  1 16:28:24 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504283307 SOCKET 0 APIC 0
[Fri Sep  1 16:28:24 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2344c offset:0xc00 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Fri Sep  1 16:29:41 2017] mce: [Hardware Error]: Machine check events logged
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]: It has been corrected by h/w and requires no further action
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]: event severity: corrected
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:  Error 0, type: corrected
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:  fru_text: B1
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:   section_type: memory error
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:   error_status: 0x0000000000000400
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:   physical_address: 0x0000005e2e14cc00
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:   node: 1 card: 0 module: 0 rank: 1 bank: 3 row: 54738 column: 304 
[Sat Sep  2 17:19:24 2017] {19}[Hardware Error]:   error_type: 2, single-bit ECC
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: TSC 209a8507d2461 
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: ADDR 5e2e14cc00 
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: MISC 0 
[Sat Sep  2 17:19:24 2017] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1504372768 SOCKET 0 APIC 0
[Sat Sep  2 17:19:24 2017] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5e2e14c offset:0xc00 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:0 channel_mask:1 rank:1)
[Sat Sep  2 17:21:09 2017] mce: [Hardware Error]: Machine check events logged

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Please note this host is still pooled and active, and will need to be depooled before it is taken offline for dimm replacement.

Dell service request: SR954179119. Replacement dimm should arrive either Friday or Monday. I'll be onsite Monday to replace the defective dimm (after depooling the server properly)

IMG_20170925_122823156.jpg (3×2 px, 1 MB)

Replaced the bad memory dimm, will drop off shipment in usps mailbox.

RobH claimed this task.