Alert manager just now reported kernel errors on cloudvirt0147. For once, this is not the result of a recent reboot, as the host has been up for 4 days.
[385127.260440] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[385127.260447] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[385127.260450] {1}[Hardware Error]: event severity: corrected
[385127.260454] {1}[Hardware Error]: Error 0, type: corrected
[385127.260457] {1}[Hardware Error]: fru_text: A11
[385127.260460] {1}[Hardware Error]: section_type: memory error
[385127.260463] {1}[Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000000400)
[385127.260469] {1}[Hardware Error]: physical_address: 0x00000013d001af00
[385127.260478] {1}[Hardware Error]: node:1 card:1 module:1 rank:1 bank:0 device:16 row:38160 column:176
[385127.260482] {1}[Hardware Error]: error_type: 2, single-bit ECC
[385127.260488] {1}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000
[385127.260505] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 65534
[385127.260509] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[385127.260513] {2}[Hardware Error]: event severity: corrected
[385127.260517] {2}[Hardware Error]: Error 0, type: corrected
[385127.260522] {2}[Hardware Error]: section type: unknown, 330f1140-72a5-11df-9690-0002a5d5c51b
[385127.260527] {2}[Hardware Error]: section length: 0x38
[385127.260536] {2}[Hardware Error]: 00000000: 01010001 00000000 d001a000 00000013 ................
[385127.260543] {2}[Hardware Error]: 00000010: 00001000 00000000 d001afff 00000013 ................
[385127.260549] {2}[Hardware Error]: 00000020: 00000080 00000000 00000000 00000000 ................
[385127.260553] {2}[Hardware Error]: 00000030: 00000000 00000000 ........
[385127.266397] mce: [Hardware Error]: Machine check events logged
[385127.266414] EDAC skx MC1: HANDLING MCE MEMORY ERROR
[385127.266417] EDAC skx MC1: CPU 0: Machine Check Event: 0x0 Bank 255: 0x9c0000000000009f
[385127.266423] EDAC skx MC1: TSC 0x0
[385127.266426] EDAC skx MC1: ADDR 0x13d001af00
[385127.266428] EDAC skx MC1: MISC 0x8c
[385127.266431] EDAC skx MC1: PROCESSOR 0:0x50657 TIME 1739254674 SOCKET 0 APIC 0x0
[385127.266455] EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#1_DIMM#1 (channel:1 slot:1 page:0x13d001a offset:0xf00 grain:32 syndrome:0x0 - err_code:0x0000:0x009f ProcessorSocketId:0x0 MemoryControllerId:0x1 PhysicalRankId:0x1 Row:0x9510 Column:0xb0 Bank:0x0 BankGroup:0x0 retry_rd_err_log[0001a20d 00000000 00000020 042c0150 00009510] correrrcnt[0000 0000 0000 0000 0000 0000 0000 0000])
[385559.196269] Process accounting resumed