Page MenuHomePhabricator

Raise memory limit for copyvios tool's k8s webservice
Closed, ResolvedPublic

Description

I've been doing some work to improve the performance of the copyvios tool, which is now running on k8s after using the old grid for a long time. k8s enforces a memory limit of 2G, despite the tool previously having its limit raised to 6G. I'm not sure I need that much, but 4G would help a lot, I think. Can this limit be raised easily?

Related Objects

Event Timeline

Earwig created this task.Feb 17 2019, 12:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 17 2019, 12:30 AM

Is there any evidence that the tool is running out of memory? Grid counts memory by virtual size (which is IMO not sane), but though I can't google any relevant information, I'm inclined to think k8s calculates memory by resident set size (which is IMO much saner) and you are much less likely to run out of it, even with a lower threshold. <rant>This is why people say 'I don't need so much memory if I run it at home but why do I have to specify such large number in -mem'.</rant>

uWSGI logs the following every several hours, which I assume is the OOM-killer:

DAMN ! worker 1 (pid: 546) died, killed by signal 9 :( trying respawn ...

Admittedly I'm running with more workers than normal (8 instead of 4), but even with 4, I was seeing these messages every so often. The reason I've upped the number of workers is that individual requests can take a long time, so it's possible for all 4 to be occupied, causing requests to back up.

I've done some work to see if this is a memory leak in my tool in the past, but never found anything. However, since you've mentioned that the old grid and k8s count memory differently, I'm thinking I'm just being greedy. I'll do some further research to see if I can finally pinpoint the cause.

zhuyifei1999 added a comment.EditedFeb 17 2019, 6:08 PM

Hmm the dmesg is a bit confusing (redacted all kernel addresses because of kaslr):

1[Feb17 17:22] uwsgi invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=968
2[ +0.000008] uwsgi cpuset=2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73 mems_allowed=0
3[ +0.000015] CPU: 3 PID: 13407 Comm: uwsgi Not tainted 4.9.0-0.bpo.6-amd64 #1 Debian 4.9.88-1+deb9u1~bpo8+1
4[ +0.000001] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1~cloud0 04/01/2014
5[ +0.000002] 0000000000000000 ffffffffREDACTED ffff----REDACTED ffff----REDACTED
6[ +0.000003] ffffffffREDACTED 0000000000000000 00000000000003c8 ffff----REDACTED
7[ +0.000002] ffff----REDACTED 0000000000000000 ffff----REDACTED ffffffffREDACTED
8[ +0.000002] Call Trace:
9[ +0.000034] [<ffffffffREDACTED>] ? dump_stack+0x5a/0x6f
10[ +0.000013] [<ffffffffREDACTED>] ? dump_header+0x85/0x212
11[ +0.000003] [<ffffffffREDACTED>] ? mem_cgroup_scan_tasks+0xc7/0xe0
12[ +0.000011] [<ffffffffREDACTED>] ? oom_kill_process+0x228/0x3e0
13[ +0.000002] [<ffffffffREDACTED>] ? out_of_memory+0x10c/0x4b0
14[ +0.000004] [<ffffffffREDACTED>] ? mem_cgroup_out_of_memory+0x49/0x80
15[ +0.000002] [<ffffffffREDACTED>] ? mem_cgroup_oom_synchronize+0x2f5/0x320
16[ +0.000002] [<ffffffffREDACTED>] ? mem_cgroup_oom_unregister_event+0x80/0x80
17[ +0.000002] [<ffffffffREDACTED>] ? pagefault_out_of_memory+0x2f/0x80
18[ +0.000010] [<ffffffffREDACTED>] ? __do_page_fault+0x4a2/0x500
19[ +0.000017] [<ffffffffREDACTED>] ? async_page_fault+0x28/0x30
20[ +0.000001] Task in /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73 killed as a result of limit of /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73
21[ +0.000005] memory: usage 2097152kB, limit 2097152kB, failcnt 33362622
22[ +0.000001] memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
23[ +0.000001] kmem: usage 14028kB, limit 9007199254740988kB, failcnt 0
24[ +0.000000] Memory cgroup stats for /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73: cache:6124KB rss:2077000KB rss_huge:0KB mapped_file:3384KB dirty:0KB writeback:0KB swap:0KB inactive_anon:520404KB active_anon:1557796KB inactive_file:2560KB active_file:2364KB unevictable:0KB
25[ +0.000011] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
26[ +0.001012] [13407] 51330 13407 47201 13993 84 3 0 968 uwsgi
27[ +0.000210] [15254] 51330 15254 6050 401 16 3 0 968 bash
28[ +0.001017] [ 1406] 51330 1406 251811 84691 301 4 0 968 uwsgi
29[ +0.000002] [ 7203] 51330 7203 234791 74714 279 4 0 968 uwsgi
30[ +0.000001] [21119] 51330 21119 220137 48255 215 4 0 968 uwsgi
31[ +0.000003] [21208] 51330 21208 219869 50895 253 4 0 968 uwsgi
32[ +0.000002] [17220] 51330 17220 219712 54137 233 4 0 968 uwsgi
33[ +0.000001] [13168] 51330 13168 257712 94628 323 4 0 968 uwsgi
34[ +0.000009] [21822] 51330 21822 235716 67312 252 4 0 968 uwsgi
35[ +0.000002] [ 7984] 51330 7984 228533 52367 219 4 0 968 uwsgi
36[ +0.000012] [ 9625] 51330 9625 1461 21 8 3 0 968 tail
37[ +0.000003] Memory cgroup out of memory: Kill process 13168 (uwsgi) score 1148 or sacrifice child
38[ +0.010237] Killed process 13168 (uwsgi) total-vm:1030848kB, anon-rss:376280kB, file-rss:2108kB, shmem-rss:124kB

Checking line 38 and line 33, line 38's numeric values of total_vm and rss are 4 times as big as line 33, so line 33 must be counted in pages.

The sum of total_vm is 1922993 (pages, = 7691972kB), while the sum of rss is 541414 (pages, = 2165656kB). Line 21 says the cgroup 'memory' (which type?) usage is 2097152kB and then line 24 says rss usage is 2077000kB. So is the total RSS 2077000kB or 2165656kB?

Looking at https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt 2.2.1 says RSS + Page Cache are accounted, then 2077000+6124 = 2083124kB, still 14028kB off from 2097152kB, which is exactly the amount of mapped kernel memory in line 23.

My best guess from that is that cgroup indeed counts by RSS and your RSS is indeed pretty big, although the exact method it sums up the RSS seems slightly off (2165656kB vs 2077000 kB, 4%) from what I'd expect from numerically summing.

Though, one way you could check for memory leak is getting a core dump (if you want you could tell me a pid and I can get you a core dump via gdb while the process is suspended), but trying to determine what is leaking from the core dump could be tedious.

Another way would be to use some libraries to collect statistics on what objects are currently allocated. https://stackoverflow.com/q/1435415 has some examples for such libraries. I've personally used https://pypi.org/project/mem_top/ once or twice but last time I tried it only works for Python 2. Note though, that mem_top library does an implicit invoke to gc, so if the issue just disappears after using such libraries then it could be possible that it is gc being too infrequent for some obscure reason. (I encounter this issue once outside Toolforge. Solution? Manually invoke gc periodically *facepalm*. Honestly, even after reading Python's gc documentation twice I still don't understand when gc is invoked automatically.)

Did some investigating with my tool of choice guppy and found a potential "leak" (really shouldn't be, but apparently a stack frame was living longer than intended and keeping a bunch of things alive with it). With that cleaned up, the pure-Python tools no longer seem to be reporting any leak candidates, but memory usage still seems kinda high. I'll follow up.

I've managed to fix a couple more bugs and poor design choices in the tool, and it looks like memory usage has fallen to more reasonable levels, so I'm closing this ticket. Thanks for the help earlier!

Earwig closed this task as Resolved.Feb 24 2019, 5:33 AM
Earwig claimed this task.