The Memory RSS by Service graph on grafana shows a spike on varnish memory usage at 11:59 up to 352GiB:
kern.log
Mar 8 11:59:06 cp3053 kernel: [2177718.395042] traffic_manager invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Mar 8 11:59:06 cp3053 kernel: [2177718.395043] traffic_manager cpuset=/ mems_allowed=0-1 Mar 8 11:59:06 cp3053 kernel: [2177718.395048] CPU: 0 PID: 965 Comm: traffic_manager Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1 Mar 8 11:59:06 cp3053 kernel: [2177718.395050] Hardware name: Dell Inc. PowerEdge R440/08CYF7, BIOS 2.3.10 08/15/2019 Mar 8 11:59:06 cp3053 kernel: [2177718.395052] Call Trace: Mar 8 11:59:06 cp3053 kernel: [2177718.395061] dump_stack+0x66/0x90 Mar 8 11:59:06 cp3053 kernel: [2177718.395065] dump_header+0x6b/0x283 Mar 8 11:59:06 cp3053 kernel: [2177718.395070] oom_kill_process.cold.30+0xb/0x1cf Mar 8 11:59:06 cp3053 kernel: [2177718.395072] ? oom_badness+0x23/0x140 Mar 8 11:59:06 cp3053 kernel: [2177718.395074] out_of_memory+0x1a5/0x450 Mar 8 11:59:06 cp3053 kernel: [2177718.395083] __alloc_pages_slowpath+0xbd8/0xcb0 Mar 8 11:59:06 cp3053 kernel: [2177718.395088] __alloc_pages_nodemask+0x28b/0x2b0 Mar 8 11:59:06 cp3053 kernel: [2177718.395090] filemap_fault+0x3bd/0x780 Mar 8 11:59:06 cp3053 kernel: [2177718.395093] ? alloc_set_pte+0xf2/0x560 Mar 8 11:59:06 cp3053 kernel: [2177718.395098] ? filemap_map_pages+0x1ed/0x3a0 Mar 8 11:59:06 cp3053 kernel: [2177718.395127] ext4_filemap_fault+0x2c/0x40 [ext4] Mar 8 11:59:06 cp3053 kernel: [2177718.395130] __do_fault+0x36/0x130 Mar 8 11:59:06 cp3053 kernel: [2177718.395133] __handle_mm_fault+0xe6c/0x1270 Mar 8 11:59:06 cp3053 kernel: [2177718.395136] ? common_interrupt+0xa/0xf Mar 8 11:59:06 cp3053 kernel: [2177718.395138] handle_mm_fault+0xd6/0x200 Mar 8 11:59:06 cp3053 kernel: [2177718.395140] __do_page_fault+0x249/0x4f0 Mar 8 11:59:06 cp3053 kernel: [2177718.395143] ? page_fault+0x8/0x30 Mar 8 11:59:06 cp3053 kernel: [2177718.395144] page_fault+0x1e/0x30 Mar 8 11:59:06 cp3053 kernel: [2177718.395147] RIP: 0033:0x7f12b896a860 Mar 8 11:59:06 cp3053 kernel: [2177718.395151] Code: Bad RIP value. Mar 8 11:59:06 cp3053 kernel: [2177718.395152] RSP: 002b:00007ffcc90d3c08 EFLAGS: 00010206 Mar 8 11:59:06 cp3053 kernel: [2177718.395153] RAX: 0000000000000000 RBX: 00007ffcc90d3f10 RCX: 000000000000000a Mar 8 11:59:06 cp3053 kernel: [2177718.395154] RDX: 0000000000000e10 RSI: 0000000000000000 RDI: 0000555583097420 Mar 8 11:59:06 cp3053 kernel: [2177718.395155] RBP: 00007ffcc90d4450 R08: 0000000000000000 R09: 0000000000000000 Mar 8 11:59:06 cp3053 kernel: [2177718.395155] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000e10 Mar 8 11:59:06 cp3053 kernel: [2177718.395156] R13: 0000000000000064 R14: 0000000000000e10 R15: 00007ffcc90d4050 Mar 8 11:59:06 cp3053 kernel: [2177718.395158] Mem-Info: Mar 8 11:59:06 cp3053 kernel: [2177718.395165] active_anon:95678997 inactive_anon:165370 isolated_anon:0 Mar 8 11:59:06 cp3053 kernel: [2177718.395165] active_file:0 inactive_file:0 isolated_file:0 Mar 8 11:59:06 cp3053 kernel: [2177718.395165] unevictable:0 dirty:0 writeback:0 unstable:0 Mar 8 11:59:06 cp3053 kernel: [2177718.395165] slab_reclaimable:76401 slab_unreclaimable:241171 Mar 8 11:59:06 cp3053 kernel: [2177718.395165] mapped:53926 shmem:418585 pagetables:339566 bounce:0 Mar 8 11:59:06 cp3053 kernel: [2177718.395165] free:653414 free_pcp:1412 free_cma:0 Mar 8 11:59:06 cp3053 kernel: [2177718.395168] Node 0 active_anon:192437320kB inactive_anon:154152kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21944kB dirty:0kB writeback:0kB shmem:388536kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2820096kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Mar 8 11:59:06 cp3053 kernel: [2177718.395171] Node 1 active_anon:190278668kB inactive_anon:507328kB active_file:16kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:193760 kB dirty:0kB writeback:0kB shmem:1285804kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6107136kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Mar 8 11:59:06 cp3053 kernel: [2177718.395176] Node 0 DMA free:15900kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB p resent:15984kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395179] lowmem_reserve[]: 0 1393 191851 191851 191851 Mar 8 11:59:06 cp3053 kernel: [2177718.395182] Node 0 DMA32 free:767832kB min:7736kB low:9668kB high:11600kB active_anon:675624kB inactive_anon:76kB active_file:128kB inactive_file:0kB unevictable:0kB wr itepending:0kB present:1521664kB managed:1456096kB mlocked:0kB kernel_stack:0kB pagetables:104kB bounce:0kB free_pcp:3016kB local_pcp:0kB free_cma:0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395188] lowmem_reserve[]: 0 0 190457 190457 190457 Mar 8 11:59:06 cp3053 kernel: [2177718.395191] Node 0 Normal free:777376kB min:1036316kB low:1295392kB high:1554468kB active_anon:191762756kB inactive_anon:154076kB active_file:0kB inactive_file:212kB un evictable:0kB writepending:0kB present:198180864kB managed:195032876kB mlocked:0kB kernel_stack:15800kB pagetables:883132kB bounce:0kB free_pcp:2636kB local_pcp:244kB free_cma:0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395195] lowmem_reserve[]: 0 0 0 0 0 Mar 8 11:59:06 cp3053 kernel: [2177718.395201] Node 1 Normal free:1052548kB min:1053012kB low:1316264kB high:1579516kB active_anon:190278668kB inactive_anon:507328kB active_file:16kB inactive_file:0kB un evictable:0kB writepending:0kB present:201326592kB managed:198175296kB mlocked:0kB kernel_stack:19784kB pagetables:475028kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395205] lowmem_reserve[]: 0 0 0 0 0 Mar 8 11:59:06 cp3053 kernel: [2177718.395214] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB Mar 8 11:59:06 cp3053 kernel: [2177718.395222] Node 0 DMA32: 3445*4kB (UME) 8045*8kB (UME) 5975*16kB (UM) 3841*32kB (UM) 2061*64kB (UME) 892*128kB (UME) 321*256kB (UME) 131*512kB (UM) 75*1024kB (UME) 0*2 048kB 0*4096kB = 768780kB Mar 8 11:59:06 cp3053 kernel: [2177718.395240] Node 0 Normal: 7192*4kB (UME) 1211*8kB (UME) 3312*16kB (UME) 1604*32kB (UME) 3688*64kB (UME) 1852*128kB (UME) 231*256kB (UME) 97*512kB (M) 51*1024kB (M) 0*2 048kB 0*4096kB = 776888kB Mar 8 11:59:06 cp3053 kernel: [2177718.395255] Node 1 Normal: 42795*4kB (UME) 17904*8kB (UME) 4967*16kB (UME) 7748*32kB (UME) 1198*64kB (UME) 250*128kB (UME) 136*256kB (ME) 86*512kB (UME) 38*1024kB (ME) 17*2048kB (E) 38*4096kB (U) = 1058716kB Mar 8 11:59:06 cp3053 kernel: [2177718.395263] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 8 11:59:06 cp3053 kernel: [2177718.395264] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 8 11:59:06 cp3053 kernel: [2177718.395265] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 8 11:59:06 cp3053 kernel: [2177718.395266] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 8 11:59:06 cp3053 kernel: [2177718.395266] 420233 total pagecache pages Mar 8 11:59:06 cp3053 kernel: [2177718.395267] 0 pages in swap cache Mar 8 11:59:06 cp3053 kernel: [2177718.395268] Swap cache stats: add 0, delete 0, find 0/0 Mar 8 11:59:06 cp3053 kernel: [2177718.395269] Free swap = 0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395269] Total swap = 0kB Mar 8 11:59:06 cp3053 kernel: [2177718.395270] 100261276 pages RAM Mar 8 11:59:06 cp3053 kernel: [2177718.395271] 0 pages HighMem/MovableOnly Mar 8 11:59:06 cp3053 kernel: [2177718.395271] 1591234 pages reserved Mar 8 11:59:06 cp3053 kernel: [2177718.395272] 0 pages hwpoisoned Mar 8 11:59:06 cp3053 kernel: [2177718.395272] Tasks state (memory values in pages): Mar 8 11:59:06 cp3053 kernel: [2177718.395273] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Mar 8 11:59:06 cp3053 kernel: [2177718.395296] [ 738] 0 738 92881 64417 770048 0 0 systemd-journal Mar 8 11:59:06 cp3053 kernel: [2177718.395298] [ 752] 0 752 5614 352 65536 0 -1000 systemd-udevd Mar 8 11:59:06 cp3053 kernel: [2177718.395302] [ 956] 0 956 771 52 49152 0 0 mdadm Mar 8 11:59:06 cp3053 kernel: [2177718.395304] [ 1000] 101 1000 23270 232 81920 0 0 systemd-timesyn Mar 8 11:59:06 cp3053 kernel: [2177718.395306] [ 1029] 108 1029 64413 4343 135168 0 0 prometheus-varn Mar 8 11:59:06 cp3053 kernel: [2177718.395308] [ 1031] 0 1031 2126 67 57344 0 0 cron Mar 8 11:59:06 cp3053 kernel: [2177718.395310] [ 1035] 106 1035 7706 295 90112 0 0 lldpd Mar 8 11:59:06 cp3053 kernel: [2177718.395312] [ 1040] 108 1040 28359 4966 126976 0 0 prometheus-traf Mar 8 11:59:06 cp3053 kernel: [2177718.395314] [ 1046] 106 1046 7707 302 86016 0 0 lldpd Mar 8 11:59:06 cp3053 kernel: [2177718.395316] [ 1051] 105 1051 2331 229 57344 0 -900 dbus-daemon Mar 8 11:59:06 cp3053 kernel: [2177718.395318] [ 1056] 0 1056 4877 265 77824 0 0 systemd-logind Mar 8 11:59:06 cp3053 kernel: [2177718.395320] [ 1079] 108 1079 1322096 10477 745472 0 0 prometheus-node Mar 8 11:59:06 cp3053 kernel: [2177718.395321] [ 1089] 0 1089 2526 369 61440 0 0 smartd Mar 8 11:59:06 cp3053 kernel: [2177718.395323] [ 1090] 0 1090 1359 61 53248 0 0 rasdaemon Mar 8 11:59:06 cp3053 kernel: [2177718.395325] [ 1106] 0 1106 1249684 6365 692224 0 0 confd Mar 8 11:59:06 cp3053 kernel: [2177718.395327] [ 1137] 0 1137 3963 216 69632 0 -1000 sshd Mar 8 11:59:06 cp3053 kernel: [2177718.395331] [ 1203] 0 1203 1346 29 53248 0 0 agetty Mar 8 11:59:06 cp3053 kernel: [2177718.395332] [ 1210] 0 1210 1403 28 45056 0 0 agetty Mar 8 11:59:06 cp3053 kernel: [2177718.395334] [ 1232] 110 1232 5013 211 81920 0 0 exim4 Mar 8 11:59:06 cp3053 kernel: [2177718.395348] [ 1249] 111 1249 1172233 7191 630784 0 0 prometheus-varn Mar 8 11:59:06 cp3053 kernel: [2177718.395349] [ 1293] 108 1293 28358 5001 122880 0 0 prometheus-traf Mar 8 11:59:06 cp3053 kernel: [2177718.395351] [ 3836] 114 3836 7632 1232 90112 0 0 vhtcpd Mar 8 11:59:06 cp3053 kernel: [2177718.395353] [ 23927] 0 23927 1363076 17557 892928 0 0 cadvisor Mar 8 11:59:06 cp3053 kernel: [2177718.395355] [ 36422] 0 36422 280811 1936 221184 0 0 rsyslogd Mar 8 11:59:06 cp3053 kernel: [2177718.395356] [ 43522] 0 43522 62531 1512 114688 0 0 traffic_manager Mar 8 11:59:06 cp3053 kernel: [2177718.395358] [ 43524] 116 43524 118909 929 114688 0 0 fifo-log-demux Mar 8 11:59:06 cp3053 kernel: [2177718.395359] [ 43525] 116 43525 118557 929 122880 0 0 fifo-log-demux Mar 8 11:59:06 cp3053 kernel: [2177718.395361] [ 43526] 0 43526 597 17 45056 0 0 sh Mar 8 11:59:06 cp3053 kernel: [2177718.395364] [ 43527] 0 43527 597 16 49152 0 0 atslog-backend Mar 8 11:59:06 cp3053 kernel: [2177718.395366] [ 43528] 0 43528 1282410 7805 684032 0 0 mtail Mar 8 11:59:06 cp3053 kernel: [2177718.395368] [ 43529] 0 43529 118582 908 114688 0 0 fifo-log-tailer Mar 8 11:59:06 cp3053 kernel: [2177718.395369] [ 43559] 116 43559 3265398 1874027 15798272 0 0 [TS_MAIN] Mar 8 11:59:06 cp3053 kernel: [2177718.395372] [ 48852] 0 48852 25316 1657 73728 0 0 varnishd Mar 8 11:59:06 cp3053 kernel: [2177718.395374] [ 48883] 112 48883 196669898 92556442 1361555456 0 0 cache-main Mar 8 11:59:06 cp3053 kernel: [2177718.395379] [ 567] 0 567 175635 26041 344064 0 0 varnishkafka Mar 8 11:59:06 cp3053 kernel: [2177718.395381] [ 568] 0 568 5049 1506 73728 0 0 python3 Mar 8 11:59:06 cp3053 kernel: [2177718.395383] [ 570] 0 570 1682 56 53248 0 0 varnishmtail Mar 8 11:59:06 cp3053 kernel: [2177718.395385] [ 571] 0 571 5026 1506 73728 0 0 python3 Mar 8 11:59:06 cp3053 kernel: [2177718.395386] [ 572] 0 572 22351 21013 212992 0 0 varnishncsa Mar 8 11:59:06 cp3053 kernel: [2177718.395388] [ 573] 0 573 1319596 8423 700416 0 0 mtail Mar 8 11:59:06 cp3053 kernel: [2177718.395393] [ 619] 0 619 22346 21011 208896 0 0 varnishlog Mar 8 11:59:06 cp3053 kernel: [2177718.395396] [ 621] 0 621 22351 21021 212992 0 0 varnishlog Mar 8 11:59:06 cp3053 kernel: [2177718.395397] [ 965] 0 965 62523 985 110592 0 0 traffic_manager Mar 8 11:59:06 cp3053 kernel: [2177718.395399] [ 967] 116 967 118557 945 114688 0 0 fifo-log-demux Mar 8 11:59:06 cp3053 kernel: [2177718.395401] [ 968] 116 968 118557 1464 122880 0 0 fifo-log-demux Mar 8 11:59:06 cp3053 kernel: [2177718.395403] [ 985] 116 985 1888016 899484 8122368 0 0 [TS_MAIN] Mar 8 11:59:06 cp3053 kernel: [2177718.395405] [ 994] 116 994 3975 255 73728 0 0 traffic_crashlo Mar 8 11:59:06 cp3053 kernel: [2177718.395409] [ 25220] 0 25220 5026 1498 69632 0 0 python3 Mar 8 11:59:06 cp3053 kernel: [2177718.395411] [ 25223] 0 25223 21923 20541 217088 0 0 varnishlog Mar 8 11:59:06 cp3053 kernel: [2177718.395415] [ 43511] 0 43511 1129770 5799 610304 0 0 prometheus-rsys Mar 8 11:59:06 cp3053 kernel: [2177718.395425] Out of memory: Kill process 48883 (cache-main) score 941 or sacrifice child Mar 8 11:59:06 cp3053 kernel: [2177718.406694] Killed process 48883 (cache-main) total-vm:786679592kB, anon-rss:370143496kB, file-rss:0kB, shmem-rss:82304kB Mar 8 11:59:07 cp3053 kernel: [2177718.698121] nvme0n1: p1 Mar 8 11:59:38 cp3053 kernel: [2177750.432291] oom_reaper: reaped process 48883 (cache-main), now anon-rss:0kB, file-rss:0kB, shmem-rss:81996kB