Page MenuHomePhabricator

oom killed varnish on cp4047
Closed, ResolvedPublic

Description

Current working theory
this seems to be related to jemalloc and THP usage and looks quite similar to https://github.com/varnishcache/varnish-cache/issues/3511

We are running the following experiments:

hostTHPjemalloc versionused memory
cp4045disabled5.1.0 (buster)346.767 GB
cp4046disabled5.1.0 (buster)351.353 GB
cp4047disabled5.2.1 (backported from bullseye)339.32 GB
cp4048disabled5.2.1 (backported from bullseye)341.492 GB
cp4049enabled5.2.1 (backported from bullseye)393.756 GB
cp4050enabled5.2.1 (backported from bullseye)442.626 GB
cp4051enabled5.1.0 (unmodified host)417.07 GB
cp4052enabled5.1.0 (unmodified host)N/A
Nov 10 23:05:17 cp4047 kernel: [798350.776001] Mem-Info:
Nov 10 23:05:17 cp4047 kernel: [798350.776011] active_anon:1747161 inactive_anon:128275203 isolated_anon:0
Nov 10 23:05:17 cp4047 kernel: [798350.776011]  active_file:141 inactive_file:385 isolated_file:2
Nov 10 23:05:17 cp4047 kernel: [798350.776011]  unevictable:27 dirty:0 writeback:0
Nov 10 23:05:17 cp4047 kernel: [798350.776011]  slab_reclaimable:34705 slab_unreclaimable:157087
Nov 10 23:05:17 cp4047 kernel: [798350.776011]  mapped:79836 shmem:284420 pagetables:515480 bounce:0
Nov 10 23:05:17 cp4047 kernel: [798350.776011]  free:781027 free_pcp:1846 free_cma:0
Nov 10 23:05:17 cp4047 kernel: [798350.776014] Node 0 active_anon:1803668kB inactive_anon:257732840kB active_file:0kB inactive_file:404kB unevictable:24kB isolated(anon):0kB isolated(file):0kB mapped:30528kB dirty:0kB writeback:0kB shmem:134364kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 19023872kB writeback_tmp:0kB kernel_stack:20960kB all_unreclaimable? no
Nov 10 23:05:17 cp4047 kernel: [798350.776017] Node 1 active_anon:5184976kB inactive_anon:255367972kB active_file:736kB inactive_file:1136kB unevictable:84kB isolated(anon):0kB isolated(file):8kB mapped:288816kB dirty:0kB writeback:0kB shmem:1003316kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 23343104kB writeback_tmp:0kB kernel_stack:21136kB all_unreclaimable? no
Nov 10 23:05:17 cp4047 kernel: [798350.776019] Node 0 DMA free:15896kB min:60kB low:72kB high:84kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776023] lowmem_reserve[]: 0 1388 257351 257351 257351
Nov 10 23:05:17 cp4047 kernel: [798350.776028] Node 0 DMA32 free:1029468kB min:5784kB low:7240kB high:8696kB reserved_highatomic:0KB active_anon:3376kB inactive_anon:420640kB active_file:24kB inactive_file:0kB unevictable:0kB writepending:0kB present:1521600kB managed:1456064kB mlocked:0kB pagetables:1112kB bounce:0kB free_pcp:504kB local_pcp:0kB free_cma:0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776031] lowmem_reserve[]: 0 0 255962 255962 255962
Nov 10 23:05:17 cp4047 kernel: [798350.776035] Node 0 Normal free:1028416kB min:1041448kB low:1303560kB high:1565672kB reserved_highatomic:2048KB active_anon:1800292kB inactive_anon:257312200kB active_file:0kB inactive_file:408kB unevictable:24kB writepending:0kB present:266338304kB managed:262112564kB mlocked:24kB pagetables:556408kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776039] lowmem_reserve[]: 0 0 0 0 0
Nov 10 23:05:17 cp4047 kernel: [798350.776043] Node 1 Normal free:1050328kB min:1049852kB low:1314076kB high:1578300kB reserved_highatomic:0KB active_anon:5184976kB inactive_anon:255367972kB active_file:736kB inactive_file:1136kB unevictable:84kB writepending:0kB present:268435456kB managed:264226884kB mlocked:84kB pagetables:1504400kB bounce:0kB free_pcp:6880kB local_pcp:0kB free_cma:0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776046] lowmem_reserve[]: 0 0 0 0 0
Nov 10 23:05:17 cp4047 kernel: [798350.776049] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Nov 10 23:05:17 cp4047 kernel: [798350.776062] Node 0 DMA32: 204*4kB (U) 2914*8kB (UM) 5525*16kB (UM) 3885*32kB (UME) 2523*64kB (UME) 1315*128kB (UME) 648*256kB (UME) 302*512kB (UME) 139*1024kB (UME) 0*2048kB 0*4096kB = 1029488kB
Nov 10 23:05:17 cp4047 kernel: [798350.776074] Node 0 Normal: 5148*4kB (ME) 47786*8kB (UMEH) 22964*16kB (UMEH) 3538*32kB (UMEH) 1192*64kB (UMEH) 240*128kB (UME) 70*256kB (UME) 27*512kB (UME) 6*1024kB (U) 0*2048kB 0*4096kB = 1028416kB
Nov 10 23:05:17 cp4047 kernel: [798350.776086] Node 1 Normal: 25231*4kB (UE) 7221*8kB (UME) 2304*16kB (UME) 1387*32kB (UME) 519*64kB (UME) 208*128kB (UME) 80*256kB (UME) 21*512kB (UME) 139*1024kB (UM) 27*2048kB (ME) 127*4096kB (M) = 1048836kB
Nov 10 23:05:17 cp4047 kernel: [798350.776100] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Nov 10 23:05:17 cp4047 kernel: [798350.776102] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 10 23:05:17 cp4047 kernel: [798350.776102] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Nov 10 23:05:17 cp4047 kernel: [798350.776103] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 10 23:05:17 cp4047 kernel: [798350.776104] 285095 total pagecache pages
Nov 10 23:05:17 cp4047 kernel: [798350.776105] 0 pages in swap cache
Nov 10 23:05:17 cp4047 kernel: [798350.776107] Swap cache stats: add 0, delete 0, find 0/0
Nov 10 23:05:17 cp4047 kernel: [798350.776107] Free swap  = 0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776108] Total swap = 0kB
Nov 10 23:05:17 cp4047 kernel: [798350.776109] 134077835 pages RAM
Nov 10 23:05:17 cp4047 kernel: [798350.776109] 0 pages HighMem/MovableOnly
Nov 10 23:05:17 cp4047 kernel: [798350.776110] 2124983 pages reserved
Nov 10 23:05:17 cp4047 kernel: [798350.776111] 0 pages hwpoisoned
Nov 10 23:05:17 cp4047 kernel: [798350.776111] Tasks state (memory values in pages):

Nov 10 23:05:17 cp4047 kernel: [798350.776112] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 10 23:05:17 cp4047 kernel: [798350.776145] [   1071]     0  1071    80287    54869   663552        0             0 systemd-journal
Nov 10 23:05:17 cp4047 kernel: [798350.776147] [   1090]     0  1090     5813      484    73728        0         -1000 systemd-udevd
Nov 10 23:05:17 cp4047 kernel: [798350.776150] [   1442]     0  1442      771       54    45056        0             0 mdadm
Nov 10 23:05:17 cp4047 kernel: [798350.776152] [   1458]   101  1458    23275      214    81920        0             0 systemd-timesyn
Nov 10 23:05:17 cp4047 kernel: [798350.776155] [   1631]     0  1631     4905      288    73728        0             0 systemd-logind
Nov 10 23:05:17 cp4047 kernel: [798350.776156] [   1641]   109  1641    43439     2648   110592        0             0 python3
Nov 10 23:05:17 cp4047 kernel: [798350.776158] [   1648]     0  1648     1359       61    49152        0             0 rasdaemon
Nov 10 23:05:17 cp4047 kernel: [798350.776161] [   1652]   105  1652     2313      203    57344        0          -900 dbus-daemon
Nov 10 23:05:17 cp4047 kernel: [798350.776163] [   1653]     0  1653  2249075    32665  1384448        0             0 cadvisor
Nov 10 23:05:17 cp4047 kernel: [798350.776165] [   1655]     0  1655     2126       65    57344        0             0 cron
Nov 10 23:05:17 cp4047 kernel: [798350.776167] [   1664]   106  1664     7054      273    94208        0             0 lldpd
Nov 10 23:05:17 cp4047 kernel: [798350.776168] [   1665]     0  1665    61504      855    98304        0             0 traffic_manager
Nov 10 23:05:17 cp4047 kernel: [798350.776170] [   1666]     0  1666     2524      365    57344        0             0 smartd
Nov 10 23:05:17 cp4047 kernel: [798350.776172] [   1679]   106  1679     7055      296    90112        0             0 lldpd
Nov 10 23:05:17 cp4047 kernel: [798350.776174] [   1680]   109  1680   880477     8034   479232        0             0 prometheus-ipmi
Nov 10 23:05:17 cp4047 kernel: [798350.776176] [   1685]   109  1685  1710277    17126   946176        0             0 prometheus-node
Nov 10 23:05:17 cp4047 kernel: [798350.776181] [   1697]     0  1697  1287574    12020   745472        0             0 confd
Nov 10 23:05:17 cp4047 kernel: [798350.776184] [   1717]   116  1717  1169878     6535   606208        0             0 fifo-log-demux
Nov 10 23:05:17 cp4047 kernel: [798350.776186] [   1741]     0  1741    21243      322    65536        0             0 ipmiseld
Nov 10 23:05:17 cp4047 kernel: [798350.776188] [   1743]     0  1743   464764     4526   319488        0             0 rsyslogd
Nov 10 23:05:17 cp4047 kernel: [798350.776190] [   1744]     0  1744     4706     1638    65536        0             0 varnishd
Nov 10 23:05:17 cp4047 kernel: [798350.776191] [   1750]     0  1750  1523416    11577   815104        0             0 mtail
Nov 10 23:05:17 cp4047 kernel: [798350.776208] [   1785]     0  1785     3454      217    61440        0         -1000 sshd
Nov 10 23:05:17 cp4047 kernel: [798350.776210] [   1789]     0  1789     1693      159    53248        0          -500 nrpe
Nov 10 23:05:17 cp4047 kernel: [798350.776213] [   1885]     0  1885     1403       28    49152        0             0 agetty
Nov 10 23:05:17 cp4047 kernel: [798350.776215] [   1905]     0  1905     1346       29    49152        0             0 agetty
Nov 10 23:05:17 cp4047 kernel: [798350.776217] [   1973]   116  1973  7811978  5313115 43347968        0             0 [TS_MAIN]
Nov 10 23:05:17 cp4047 kernel: [798350.776227] [   2031]   108  2031     5078      210    77824        0             0 exim4
Nov 10 23:05:17 cp4047 kernel: [798350.776229] [   2052]     0  2052    71496    40370   409600        0             0 haproxy
Nov 10 23:05:17 cp4047 kernel: [798350.776236] [   2215]   116  2215     3014      234    61440        0             0 traffic_crashlo
Nov 10 23:05:17 cp4047 kernel: [798350.776241] [   2421]   112  2421 320525176 124017210 2050834432        0             0 cache-main
Nov 10 23:05:17 cp4047 kernel: [798350.776244] [   2955]   111  2955  1153736     7288   622592        0             0 prometheus-varn
Nov 10 23:05:17 cp4047 kernel: [798350.776246] [   2956]     0  2956     4571     1552    73728        0             0 python3
Nov 10 23:05:17 cp4047 kernel: [798350.776247] [   2957]     0  2957     4539     1512    73728        0             0 python3
Nov 10 23:05:17 cp4047 kernel: [798350.776251] [   2959]     0  2959   227372    23776   327680        0             0 varnishkafka
Nov 10 23:05:17 cp4047 kernel: [798350.776253] [   2964]     0  2964     1682       57    49152        0             0 varnishmtail-de
Nov 10 23:05:17 cp4047 kernel: [798350.776256] [   2971]     0  2971     1682       65    49152        0             0 varnishmtail-wr
Nov 10 23:05:17 cp4047 kernel: [798350.776257] [   2973]     0  2973    22074    20971   212992        0             0 varnishncsa
Nov 10 23:05:17 cp4047 kernel: [798350.776260] [   2975]     0  2975  1707362    11795   892928        0             0 mtail
Nov 10 23:05:17 cp4047 kernel: [798350.776263] [   3014]     0  3014    22303    21206   217088        0             0 varnishlog
Nov 10 23:05:17 cp4047 kernel: [798350.776264] [   3015]     0  3015    22235    21138   221184        0             0 varnishlog
Nov 10 23:05:17 cp4047 kernel: [798350.776267] [   5431] 64368  5431  2962777   150470  2551808        0             0 purged
Nov 10 23:05:17 cp4047 kernel: [798350.776272] [  32789]     0 32789     1682       57    49152        0             0 varnishmtail-in
Nov 10 23:05:17 cp4047 kernel: [798350.776274] [  32790]     0 32790     1682       68    45056        0             0 varnishmtail-wr
Nov 10 23:05:17 cp4047 kernel: [798350.776275] [  32791]     0 32791    21944    20841   212992        0             0 varnishncsa
Nov 10 23:05:17 cp4047 kernel: [798350.776277] [  32792]     0 32792  1486038     8828   798720        0             0 mtail
Nov 10 23:05:17 cp4047 kernel: [798350.776280] [  85660]   109 85660    27923     5036   118784        0             0 prometheus-traf
Nov 10 23:05:17 cp4047 kernel: [798350.776282] [  65867]     0 65867      597       16    45056        0             0 sh
Nov 10 23:05:17 cp4047 kernel: [798350.776285] [  65868]     0 65868      597       17    49152        0             0 atslog-backend
Nov 10 23:05:17 cp4047 kernel: [798350.776291] [  65869]     0 65869  1283339     7899   692224        0             0 mtail
Nov 10 23:05:17 cp4047 kernel: [798350.776293] [  65870]     0 65870   118583      517   118784        0             0 fifo-log-tailer
Nov 10 23:05:17 cp4047 kernel: [798350.776295] [  58134]     0 58134   769134     4423   413696        0             0 prometheus-rsys
Nov 10 23:05:17 cp4047 kernel: [798350.776300] [  72675]     0 72675     4538     1498    69632        0             0 python3
Nov 10 23:05:17 cp4047 kernel: [798350.776302] [  72685]     0 72685    21661    20530   204800        0             0 varnishlog
Nov 10 23:05:17 cp4047 kernel: [798350.776311] [  64646]   114 64646   939674   125872  1585152        0             0 haproxy
Nov 10 23:05:17 cp4047 kernel: [798350.776317] [  10713]   109 10713    45345     4099   118784        0             0 prometheus-varn
Nov 10 23:05:17 cp4047 kernel: [798350.776324] [  26889]     0 26889     1682       67    45056        0             0 varnishmtail-wr
Nov 10 23:05:17 cp4047 kernel: [798350.776326] [  26890]     0 26890     1682       65    45056        0             0 varnishmtail-wr
Nov 10 23:05:17 cp4047 kernel: [798350.776329] [  26891]     0 26891     1693      159    53248        0          -500 nrpe
Nov 10 23:05:17 cp4047 kernel: [798350.776331] [  26892]     0 26892     1693      159    53248        0          -500 nrpe
Nov 10 23:05:17 cp4047 kernel: [798350.776333] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/varnish-frontend.
service,task=cache-main,pid=2421,uid=112
Nov 10 23:05:17 cp4047 kernel: [798350.777680] Out of memory: Killed process 2421 (cache-main) total-vm:1282100704kB, anon-rss:495986376kB, file-rss:0kB, shmem-rss:82464kB, UID:1
12 pgtables:2002768kB oom_score_adj:0
Nov 10 23:05:42 cp4047 kernel: [798376.572445] oom_reaper: reaped process 2421 (cache-main), now anon-rss:0kB, file-rss:0kB, shmem-rss:82100kB

This could be potentially related with how we configure memory thresholds for varnish + the fact that in the new upload hardware featuring 12TB of caching storage ATS requires 3x memory than in the old hardware:
https://grafana.wikimedia.org/goto/uip5tYD4z?orgId=1 (cp4047, 12Tb of storage, ATS requires ~20GB of memory):

image.png (1×1 px, 68 KB)

https://grafana.wikimedia.org/goto/P33TtYv4z?orgId=1 (cp3051, 1.6Tb of storage, ATS requires ~6.5GB of memory:

image.png (1×1 px, 149 KB)

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2022-11-11T06:22:03Z] <vgutierrez> restart varnish on cp4047 to clear VarnishChildRestarted alert - T322903

Free memory on NUMA Node 0 got below the min threshold (1028416 < 1041448):

Node 0 Normal free:1028416kB min:1041448kB low:1303560kB high:1565672kB reserved_highatomic:2048KB active_anon:1800292kB inactive_anon:257312200kB active_file:0kB inactive_file:408kB unevictable:24kB writepending:0kB present:266338304kB managed:262112564kB mlocked:24kB pagetables:556408kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

Change 855965 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] varnish: Increase reserved memory to 120G in upload@ulsfo

https://gerrit.wikimedia.org/r/855965

After further inspection I don't think that ATS memory increase is enough to explain what we are seeing here, text nodes in ulsfo are using around 326G of RAM but upload ones are using ~470G... that 144G gap can't be explained with the extra 13G used by ATS in upload nodes.

In fact it seems like varnish is the one eating the extra memory... in cp4045 (upload) with the following malloc specific config: -s malloc,283G -s Transient=malloc,10G varnish is consuming 458G, in cp4041 (text) with -s malloc,283G -s Transient=malloc,5G varnish is consuming 318G as reported by the system. As reported by varnish, cp4045 is consuming 304G of 304G and cp4041 "just" 151G of 304G

Change 855987 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] Depool ulsfo for resolving varnish issues

https://gerrit.wikimedia.org/r/855987

Change 855992 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] varnish: Disable THP for varnish on upload@ulsfo

https://gerrit.wikimedia.org/r/855992

Change 855992 merged by Vgutierrez:

[operations/puppet@production] varnish: Disable THP for varnish on cp404[5-8]

https://gerrit.wikimedia.org/r/855992

Mentioned in SAL (#wikimedia-operations) [2022-11-11T15:58:24Z] <vgutierrez> rolling restart of varnish in cp4045 - cp4050 - T322903

Vgutierrez lowered the priority of this task from High to Medium.Nov 11 2022, 4:41 PM

Lowing the priority after deploying several experiments in upload@ulsfo that could mitigate the issue, see the task description for more details

I've updated the description of the task to add the used memory by varnish after running the experiment for the whole weekend. Data gathered with systemctl show varnish-frontend -p MemoryCurrent --value | awk '{print $1/1024/1024/1024 " GB "}'

Change 855965 abandoned by Vgutierrez:

[operations/puppet@production] varnish: Increase reserved memory to 120G in upload@ulsfo

Reason:

https://gerrit.wikimedia.org/r/855965

Change 855987 abandoned by Ssingh:

[operations/dns@master] Depool ulsfo for resolving varnish issues

Reason:

no longer required

https://gerrit.wikimedia.org/r/855987

Change 857686 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hieradata: Disable THP for jemalloc/varnish@cp2042

https://gerrit.wikimedia.org/r/857686

Change 857686 merged by Vgutierrez:

[operations/puppet@production] hieradata: Disable THP for jemalloc/varnish globally

https://gerrit.wikimedia.org/r/857686

Mentioned in SAL (#wikimedia-operations) [2022-11-16T17:53:02Z] <sukhe> rolling restart of varnish to pick up changes in T322903

Mentioned in SAL (#wikimedia-operations) [2022-11-16T20:41:27Z] <sukhe> [finished] rolling restart of varnish to pick up changes in T322903

Vgutierrez claimed this task.

THP has been disabled globally as a result of this task with https://gerrit.wikimedia.org/r/857686. A rolling restart has been performed to applied this change:

vgutierrez@cumin1001:~$ sudo cumin A:cp 'grep -c thp:never /proc/$(systemctl show --property MainPID --value varnish-frontend.service)/environ'
96 hosts will be targeted:
cp[2027-2042].codfw.wmnet,cp[6001-6016].drmrs.wmnet,cp[1075-1090].eqiad.wmnet,cp[5002-5016,5032].eqsin.wmnet,cp[3050-3065].esams.wmnet,cp[4037-4052].ulsfo.wmnet
Ok to proceed on 96 hosts? Enter the number of affected hosts to confirm or "q" to quit 96
===== NODE GROUP =====                                                                                                                                                                                                                                                                                                                                                   
(96) cp[2027-2042].codfw.wmnet,cp[6001-6016].drmrs.wmnet,cp[1075-1090].eqiad.wmnet,cp[5002-5016,5032].eqsin.wmnet,cp[3050-3065].esams.wmnet,cp[4037-4052].ulsfo.wmnet                                                                                                                                                                                                    
----- OUTPUT of 'grep -c thp:neve...service)/environ' -----                                                                                                                                                                                                                                                                                                              
1