Page MenuHomePhabricator

Java jobs run the Stretch grid seem to require a very large memory reservation
Closed, DeclinedPublic

Description

Originally reported at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Strange_Memory_Behavior_on_Toolforge_with_Java by @Fastily:

Hi all, when I run the following command on toolforge:

jsub -once -mem 2g -quiet -j y -o a.txt java -Xmx1G -version

And view the log file:

$ cat a.txt
Error occurred during initialization of VM
Could not allocate metaspace: 1073741824 bytes

I see an error related to (lack of) memory. Changing the memory values to something crazy seems to yield the desired output.

For example:

$ jsub -once -mem 8g -quiet -j y -o a.txt java -Xmx4G -version
$ cat a.txt

results in the desired output

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Debian-3bpo91)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3bpo91, mixed mode, sharing)

That leaves me with the following questions

  1. Is this a bug or the expected behavior?
  2. Is the latter example the best practice on toolforge these days (i.e. set crazy memory values)?

Thanks in advance.

Event Timeline

102:54:06 1 ✗ zhuyifei1999@tools-sgebastion-08: ~$ (ulimit -v $(( (1<<20) * 2 )); strace -e mmap,munmap,brk -f java -Xmx1G -version)
2brk(NULL) = 0x561dc983d000
3mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f12f27e3000
4mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f12f27e1000
5mmap(NULL, 2200072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f23c0000
6mmap(0x7f12f25d8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f12f25d8000
7mmap(NULL, 2212936, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f21a3000
8mmap(0x7f12f23ba000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f12f23ba000
9mmap(0x7f12f23bc000, 13384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f23bc000
10mmap(NULL, 2159536, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f1f93000
11mmap(0x7f12f21a1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7f12f21a1000
12mmap(NULL, 2109680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f1d8f000
13mmap(0x7f12f1f91000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f12f1f91000
14mmap(NULL, 3795296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f19f0000
15mmap(0x7f12f1d85000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f12f1d85000
16mmap(0x7f12f1d8b000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f1d8b000
17mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f12f27df000
18munmap(0x7f12f27e3000, 105680) = 0
19brk(NULL) = 0x561dc983d000
20brk(0x561dc985e000) = 0x561dc985e000
21mmap(NULL, 20741976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f0628000
22mmap(0x7f12f18a4000, 1011712, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x107c000) = 0x7f12f18a4000
23mmap(0x7f12f199b000, 347992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f199b000
24mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f12f27e3000
25mmap(NULL, 3674720, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12f02a6000
26mmap(0x7f12f0618000, 49152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0x7f12f0618000
27mmap(0x7f12f0624000, 12896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f0624000
28mmap(NULL, 3158248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12effa2000
29mmap(0x7f12f02a4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x102000) = 0x7f12f02a4000
30mmap(NULL, 2188336, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12efd8b000
31mmap(0x7f12effa0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f12effa0000
32munmap(0x7f12f27e3000, 105680) = 0
33mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f12f26de000
34strace: Process 17316 attached
35[pid 17316] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12e7d8b000
36[pid 17316] munmap(0x7f12e7d8b000, 2576384) = 0
37[pid 17316] munmap(0x7f12ec000000, 64532480) = 0
38[pid 17316] mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f12f27e3000
39[pid 17316] mmap(NULL, 2128832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12efb83000
40[pid 17316] mmap(0x7f12efd89000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f12efd89000
41[pid 17316] munmap(0x7f12f27e3000, 105680) = 0
42[pid 17316] mmap(NULL, 2154912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12ef974000
43[pid 17316] mmap(0x7f12efb80000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc000) = 0x7f12efb80000
44[pid 17316] mmap(NULL, 2270984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12ef749000
45[pid 17316] mmap(0x7f12ef971000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f12ef971000
46[pid 17316] mmap(NULL, 8192, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12f27fb000
47[pid 17316] mmap(0x7f12f27fb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f27fb000
48[pid 17316] mmap(NULL, 102803208, PROT_READ, MAP_SHARED, 4, 0) = 0x7f12e1df5000
49[pid 17316] mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x7f12f27f3000
50[pid 17316] mmap(0x7f12f26de000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f26de000
51[pid 17316] mmap(NULL, 2126264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12ef541000
52[pid 17316] mmap(0x7f12ef747000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f12ef747000
53[pid 17316] mmap(NULL, 2118048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f12ef33b000
54[pid 17316] mmap(0x7f12ef53f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7f12ef53f000
55[pid 17316] mmap(NULL, 141001128, PROT_READ, MAP_SHARED, 3, 0) = 0x7f12d977c000
56[pid 17316] mmap(NULL, 251658240, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12ca77c000
57[pid 17316] mmap(0x7f12ca77c000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12ca77c000
58[pid 17316] mmap(NULL, 49152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12f27e7000
59[pid 17316] mmap(0x7f12f27e7000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12f27e7000
60[pid 17316] mmap(0x7f12cad0b000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12cad0b000
61[pid 17316] mmap(NULL, 962560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12ef250000
62[pid 17316] mmap(0x7f12ef250000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12ef250000
63[pid 17316] mmap(0x7f12d2243000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12d2243000
64[pid 17316] mmap(NULL, 962560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12ef165000
65[pid 17316] mmap(0x7f12ef165000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12ef165000
66[pid 17316] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
67[pid 17316] mmap(0xc0000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xc0000000
68[pid 17316] mmap(NULL, 2101248, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12eef64000
69[pid 17316] mmap(0x7f12ef164000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12ef164000
70[pid 17316] mmap(0xc0000000, 5570560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc0000000
71[pid 17316] mmap(0x7f12eef64000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12eef64000
72[pid 17316] mmap(0xd5550000, 11206656, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xd5550000
73[pid 17316] mmap(NULL, 1400832, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f12eee0e000
74[pid 17316] mmap(0x7f12eee0e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12eee0e000
75[pid 17316] mmap(0x7f12ef00e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f12ef00e000
76[pid 17316] mmap(0x800000000, 17489920, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x800000000
77[pid 17316] mmap(0x800000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0x1000) = 0x800000000
78[pid 17316] mmap(0x800002000, 3915776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x3000) = 0x800002000
79[pid 17316] mmap(0x8003be000, 7188480, PROT_READ, MAP_PRIVATE|MAP_FIXED, 4, 0x3bf000) = 0x8003be000
80[pid 17316] mmap(0x800a99000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0xa9a000) = 0x800a99000
81[pid 17316] mmap(0x800a9a000, 6373376, PROT_READ, MAP_PRIVATE|MAP_FIXED, 4, 0xa9b000) = 0x800a9a000
82[pid 17316] mmap(0x800000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
83[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
84[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
85[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
86[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
87[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
88[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
89[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
90[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
91[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
92[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
93[pid 17316] mmap(0x840000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
94[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
95[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
96[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
97[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
98[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
99[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
100[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
101[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
102[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
103[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
104[pid 17316] mmap(0x880000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
105[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
106[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
107[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
108[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
109[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
110[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
111[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
112[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
113[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
114[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
115[pid 17316] mmap(0x8c0000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
116[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
117[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
118[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
119[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
120[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
121[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
122[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
123[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
124[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
125[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
126[pid 17316] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
127Error occurred during initialization of VM
128Could not allocate metaspace: 1073741824 bytes
129[pid 17316] munmap(0x7f12d977c000, 1731944) = 0
130[pid 17316] +++ exited with 1 +++
131+++ exited with 1 +++
13202:54:22 1 ✗ zhuyifei1999@tools-sgebastion-08: ~$ (ulimit -v unlimited; strace -e mmap,munmap,brk -f java -Xmx1G -version)
133brk(NULL) = 0x55c91eea1000
134mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f78355b4000
135mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f78355b2000
136mmap(NULL, 2200072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7835191000
137mmap(0x7f78353a9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f78353a9000
138mmap(NULL, 2212936, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7834f74000
139mmap(0x7f783518b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f783518b000
140mmap(0x7f783518d000, 13384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f783518d000
141mmap(NULL, 2159536, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7834d64000
142mmap(0x7f7834f72000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7f7834f72000
143mmap(NULL, 2109680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7834b60000
144mmap(0x7f7834d62000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f7834d62000
145mmap(NULL, 3795296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f78347c1000
146mmap(0x7f7834b56000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f7834b56000
147mmap(0x7f7834b5c000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7834b5c000
148mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f78355b0000
149munmap(0x7f78355b4000, 105680) = 0
150brk(NULL) = 0x55c91eea1000
151brk(0x55c91eec2000) = 0x55c91eec2000
152mmap(NULL, 20741976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f78333f9000
153mmap(0x7f7834675000, 1011712, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x107c000) = 0x7f7834675000
154mmap(0x7f783476c000, 347992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f783476c000
155mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f78355b4000
156mmap(NULL, 3674720, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7833077000
157mmap(0x7f78333e9000, 49152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0x7f78333e9000
158mmap(0x7f78333f5000, 12896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78333f5000
159mmap(NULL, 3158248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7832d73000
160mmap(0x7f7833075000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x102000) = 0x7f7833075000
161mmap(NULL, 2188336, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7832b5c000
162mmap(0x7f7832d71000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f7832d71000
163munmap(0x7f78355b4000, 105680) = 0
164mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f78354af000
165strace: Process 17334 attached
166[pid 17334] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f782ab5c000
167[pid 17334] munmap(0x7f782ab5c000, 21643264) = 0
168[pid 17334] munmap(0x7f7830000000, 45465600) = 0
169[pid 17334] mmap(NULL, 105680, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f78355b4000
170[pid 17334] mmap(NULL, 2128832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7832954000
171[pid 17334] mmap(0x7f7832b5a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f7832b5a000
172[pid 17334] munmap(0x7f78355b4000, 105680) = 0
173[pid 17334] mmap(NULL, 2154912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7832745000
174[pid 17334] mmap(0x7f7832951000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc000) = 0x7f7832951000
175[pid 17334] mmap(NULL, 2270984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f783251a000
176[pid 17334] mmap(0x7f7832742000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f7832742000
177[pid 17334] mmap(NULL, 8192, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f78355cc000
178[pid 17334] mmap(0x7f78355cc000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78355cc000
179[pid 17334] mmap(NULL, 102803208, PROT_READ, MAP_SHARED, 4, 0) = 0x7f7825df5000
180[pid 17334] mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x7f78355c4000
181[pid 17334] mmap(0x7f78354af000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78354af000
182[pid 17334] mmap(NULL, 2126264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7832312000
183[pid 17334] mmap(0x7f7832518000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f7832518000
184[pid 17334] mmap(NULL, 2118048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f783210c000
185[pid 17334] mmap(0x7f7832310000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7f7832310000
186[pid 17334] mmap(NULL, 141001128, PROT_READ, MAP_SHARED, 3, 0) = 0x7f781d77c000
187[pid 17334] mmap(NULL, 251658240, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f780e77c000
188[pid 17334] mmap(0x7f780e77c000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f780e77c000
189[pid 17334] mmap(NULL, 49152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f78355b8000
190[pid 17334] mmap(0x7f78355b8000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78355b8000
191[pid 17334] mmap(0x7f780ed0b000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f780ed0b000
192[pid 17334] mmap(NULL, 962560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7832021000
193[pid 17334] mmap(0x7f7832021000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7832021000
194[pid 17334] mmap(0x7f7816243000, 2555904, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7816243000
195[pid 17334] mmap(NULL, 962560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7831f36000
196[pid 17334] mmap(0x7f7831f36000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7831f36000
197[pid 17334] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
198[pid 17334] mmap(0xc0000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xc0000000
199[pid 17334] mmap(NULL, 2101248, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7831d35000
200[pid 17334] mmap(0x7f7831f35000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7831f35000
201[pid 17334] mmap(0xc0000000, 5570560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc0000000
202[pid 17334] mmap(0x7f7831d35000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7831d35000
203[pid 17334] mmap(0xd5550000, 11206656, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xd5550000
204[pid 17334] mmap(NULL, 1400832, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7831bdf000
205[pid 17334] mmap(0x7f7831bdf000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7831bdf000
206[pid 17334] mmap(0x7f7831ddf000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7831ddf000
207[pid 17334] mmap(0x800000000, 17489920, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x800000000
208[pid 17334] mmap(0x800000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0x1000) = 0x800000000
209[pid 17334] mmap(0x800002000, 3915776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x3000) = 0x800002000
210[pid 17334] mmap(0x8003be000, 7188480, PROT_READ, MAP_PRIVATE|MAP_FIXED, 4, 0x3bf000) = 0x8003be000
211[pid 17334] mmap(0x800a99000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0xa9a000) = 0x800a99000
212[pid 17334] mmap(0x800a9a000, 6373376, PROT_READ, MAP_PRIVATE|MAP_FIXED, 4, 0xa9b000) = 0x800a9a000
213[pid 17334] mmap(0x800000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77ce77c000
214[pid 17334] munmap(0x7f77ce77c000, 1073741824) = 0
215[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77ce77c000
216[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f778e77c000
217[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f774e77c000
218[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f770e77c000
219[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f76ce77c000
220[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f768e77c000
221[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f764e77c000
222[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f760e77c000
223[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f75ce77c000
224[pid 17334] mmap(NULL, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f758e77c000
225[pid 17334] munmap(0x7f77ce77c000, 1073741824) = 0
226[pid 17334] munmap(0x7f778e77c000, 1073741824) = 0
227[pid 17334] munmap(0x7f774e77c000, 1073741824) = 0
228[pid 17334] munmap(0x7f770e77c000, 1073741824) = 0
229[pid 17334] munmap(0x7f76ce77c000, 1073741824) = 0
230[pid 17334] munmap(0x7f768e77c000, 1073741824) = 0
231[pid 17334] munmap(0x7f764e77c000, 1073741824) = 0
232[pid 17334] munmap(0x7f760e77c000, 1073741824) = 0
233[pid 17334] munmap(0x7f75ce77c000, 1073741824) = 0
234[pid 17334] munmap(0x7f758e77c000, 1073741824) = 0
235[pid 17334] mmap(0x840000000, 1073741824, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x840000000
236[pid 17334] mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f783548e000
237[pid 17334] mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f783546d000
238[pid 17334] mmap(NULL, 8388608, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f78313df000
239[pid 17334] mmap(NULL, 163840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7835445000
240[pid 17334] mmap(NULL, 372736, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f78353ea000
241[pid 17334] mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f783135e000
242[pid 17334] mmap(0x7f78313df000, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78313df000
243[pid 17334] mmap(0x840000000, 393216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x840000000
244[pid 17334] mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f783125c000
245strace: Process 17335 attached
246[pid 17335] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f780677c000
247[pid 17335] munmap(0x7f780677c000, 25706496) = 0
248[pid 17335] munmap(0x7f780c000000, 41402368) = 0
249[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f783115b000
250strace: Process 17336 attached
251[pid 17336] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7800000000
252[pid 17336] munmap(0x7f7804000000, 67108864) = 0
253[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
254[pid 17336] mmap(0x7f783115b000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0 <unfinished ...>
255[pid 17334] <... mmap resumed> ) = 0x7f783105a000
256[pid 17336] <... mmap resumed> ) = 0x7f783115b000
257strace: Process 17337 attached
258[pid 17337] mmap(0x7f7804000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f7804000000
259[pid 17337] mmap(0x7f783105a000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f783105a000
260[pid 17334] mmap(NULL, 20301776, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f780d41f000
261[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830f59000
262strace: Process 17338 attached
263[pid 17338] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77f8000000
264[pid 17338] munmap(0x7f77fc000000, 67108864) = 0
265[pid 17338] mmap(0x7f7830f59000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7830f59000
266[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830e58000
267strace: Process 17339 attached
268[pid 17339] mmap(0x7f77fc000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77fc000000
269[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830d57000
270[pid 17339] mmap(0x7f7830e58000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0strace: Process 17340 attached
271) = 0x7f7830e58000
272[pid 17340] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77f0000000
273[pid 17340] munmap(0x7f77f4000000, 67108864) = 0
274[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830c56000
275[pid 17340] mmap(0x7f7830d57000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7830d57000
276strace: Process 17341 attached
277[pid 17341] mmap(0x7f77f4000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77f4000000
278[pid 17341] mmap(0x7f7830c56000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7830c56000
279[pid 17334] mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830b55000
280strace: Process 17342 attached
281[pid 17342] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77e8000000
282[pid 17342] munmap(0x7f77ec000000, 67108864) = 0
283[pid 17342] mmap(0x7f7830b55000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7830b55000
284[pid 17334] mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f7830a53000
285strace: Process 17343 attached
286[pid 17343] mmap(0x7f77ec000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f77ec000000
287openjdk version "11.0.2" 2019-01-15
288OpenJDK Runtime Environment (build 11.0.2+9-Debian-3bpo91)
289OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3bpo91, mixed mode, sharing)
290[pid 17334] mmap(0x7f78354af000, 16384, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f78354af000
291[pid 17334] mmap(0x7f78354af000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f78354af000
292[pid 17334] mmap(0x7f78354af000, 16384, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f78354af000
293[pid 17338] mmap(0x7f7830f59000, 16384, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
294[pid 17343] +++ exited with 0 +++
295[pid 17338] <... mmap resumed> ) = 0x7f7830f59000
296[pid 17338] +++ exited with 0 +++
297[pid 17335] +++ exited with 0 +++
298[pid 17334] +++ exited with 0 +++
299[pid 17333] munmap(0x7f781d77c000, 1731944) = 0
300[pid 17342] +++ exited with 0 +++
301[pid 17341] +++ exited with 0 +++
302[pid 17340] +++ exited with 0 +++
303[pid 17339] +++ exited with 0 +++
304[pid 17337] +++ exited with 0 +++
305[pid 17336] +++ exited with 0 +++
306+++ exited with 0 +++
30702:54:50 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$

It wants 10 GiB of heap space... but what's wrong with giving it just that? It's not like it's a consumable given that every process has their own address space.

Is the latter example the best practice on toolforge these days (i.e. set crazy memory values)?

The latter example demonstrates exactly how grid engine is nuts.

Oh and, gdb doesn't help on why it needs 10GiB of heap space.

02:54:50 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ gdb java -ex 'catch syscall write' -ex 'r -Xmx1G -version'
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
[...]
Type "apropos word" to search for commands related to "word"...
Reading symbols from java...(no debugging symbols found)...done.
Catchpoint 1 (syscall 'write' [1])
Starting program: /usr/bin/java -Xmx1G -version
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7fd9700 (LWP 17480)]
[Switching to Thread 0x7ffff7fd9700 (LWP 17480)]

Thread 2 "java" hit Catchpoint 1 (call to syscall write), 0x00007ffff79b21cd in write () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff79b21cd in write () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff6a067e0 in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
#2  0x00007ffff6a058e3 in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
#3  0x00007ffff6628ee2 in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
#4  0x00007ffff6b5568e in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
#5  0x00007ffff66d8aa6 in JNI_CreateJavaVM () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
#6  0x00007ffff77966c4 in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/bin/../lib/jli/libjli.so
#7  0x00007ffff779ac2d in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/bin/../lib/jli/libjli.so
#8  0x00007ffff79a94a4 in start_thread (arg=0x7ffff7fd9700) at pthread_create.c:456
#9  0x00007ffff72d7d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb) up
#1  0x00007ffff6a067e0 in ?? () from /usr/lib/jvm/java-11-openjdk-amd64/lib/server/libjvm.so
(gdb) disas
No function contains program counter for selected frame.

I spent quite a while playing with OpenJDK cli arguments and various memory limits on the grid. I really wasn't able to find any magic combination that allowed a JVM to startup with less than a 4G -mem setting (technically a -l h_vmem=4G argument to qsub). It seems that even when given lots of "extra" arguments like -XX:MaxRAM=<size>, -Xms<size>, -Xmx<size>, and -Xss<size> OpenJDK 11.0.2 still tries to allocate an amount of virtual memory that will trigger a crash with less than 4G allowed by the Son of Grid Engine access controller.

I did find some articles online that seemed to indicate that OpenJDK would probably be a bit smarter about how it allocates its initial heaps and edens if it detected that it was running under cgroup limits (for example in a Docker container). The core issue seems to be in part that the OpenJDK code is trying to be smarter than the operator and interrogating the system to decide how much ram and cpu are available globally on the host. In a shared environment these guesses end up being higher than the limits actually applied to the individual application at runtime permit.

To try and address the original questions:

  1. Is this a bug or the expected behavior?

I think there are potentially some bugs in play here, but they seem to be upstream in OpenJDK rather than in Son of Grid Engine or the configured Toolforge deployment of it. Specifically

  1. Is the latter example the best practice on toolforge these days (i.e. set crazy memory values)?

I hesitate to call it a "best practice", but for OpenJDK jobs it may be necessary to ask for a fairly large reservation. Certainly the -mem granted to a job will need to be significantly larger than any -Xmx value that is used. -Xmx sets the maximum heap size, but this is only a part of the memory allocated by the JVM at startup. Other memory includes the per-thread stack space, "native" memory (off-heap allocations), Eden size, shared class space, etc:

$ java -Xms32M -Xmx128M -Xmn2M -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary -XX:+PrintNMTStatistics -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Debian-3bpo91)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3bpo91, mixed mode, sharing)

Native Memory Tracking:

Total: reserved=1517235KB, committed=110115KB
-                 Java Heap (reserved=131072KB, committed=34816KB)
                            (mmap: reserved=131072KB, committed=34816KB)

-                     Class (reserved=1056864KB, committed=4576KB)
                            (classes #430)
                            (  instance classes #364, array classes #66)
                            (malloc=96KB #451)
                            (mmap: reserved=1056768KB, committed=4480KB)
                            (  Metadata:   )
                            (    reserved=8192KB, committed=4096KB)
                            (    used=136KB)
                            (    free=3960KB)
                            (    waste=0KB =0.00%)
                            (  Class space:)
                            (    reserved=1048576KB, committed=384KB)
                            (    used=3KB)
                            (    free=381KB)
                            (    waste=0KB =0.00%)

-                    Thread (reserved=15461KB, committed=597KB)
                            (thread #15)
                            (stack: reserved=15392KB, committed=528KB)
                            (malloc=52KB #84)
                            (arena=18KB #28)

-                      Code (reserved=247725KB, committed=7585KB)
                            (malloc=37KB #372)
                            (mmap: reserved=247688KB, committed=7548KB)

-                        GC (reserved=46059KB, committed=42487KB)
                            (malloc=8407KB #506)
                            (mmap: reserved=37652KB, committed=34080KB)

-                  Compiler (reserved=135KB, committed=135KB)
                            (malloc=4KB #58)
                            (arena=131KB #5)

-                  Internal (reserved=543KB, committed=543KB)
                            (malloc=503KB #1089)
                            (mmap: reserved=40KB, committed=40KB)

-                    Symbol (reserved=1038KB, committed=1038KB)
                            (malloc=678KB #11)
                            (arena=360KB #1)

-    Native Memory Tracking (reserved=74KB, committed=74KB)
                            (malloc=5KB #66)
                            (tracking overhead=69KB)

-        Shared class space (reserved=17080KB, committed=17080KB)
                            (mmap: reserved=17080KB, committed=17080KB)

-               Arena Chunk (reserved=1104KB, committed=1104KB)
                            (malloc=1104KB)

-                   Logging (reserved=4KB, committed=4KB)
                            (malloc=4KB #185)

-                 Arguments (reserved=17KB, committed=17KB)
                            (malloc=17KB #465)

-                    Module (reserved=58KB, committed=58KB)
                            (malloc=58KB #1024)

Sorry if this isn't directly related to the reported bug, but is there any way to find out why a process got sigkilled on the grid? I have a Java process getting terminated without traces of hserr*.log, and I suspect it is related to some native libraries allocating too much off-heap memory. I'd like to confirm that it's due to the OOM killer (or perhaps quota related?)

Sorry if this isn't directly related to the reported bug, but is there any way to find out why a process got sigkilled on the grid? I have a Java process getting terminated without traces of hserr*.log, and I suspect it is related to some native libraries allocating too much off-heap memory. I'd like to confirm that it's due to the OOM killer (or perhaps quota related?)

https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Returning_the_status_of_a_particular_job

thanks, with qacct -j jobid I get:

...
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
maxvmem      16.558GB
category     -q task -l h_vmem=16777216k

maxvmem is greater than h_vmem so the process was probably killed because of that. Strange thing is that the JVM is limited to -Xmx8GB, so this must be off-heap memory.

I'm running into more problems with Java on the grid: the JVM gets killed (exit code 137 = SIGKILL), but if I do a qacct -j jobid I don't see any enforced limits:

* What went wrong:
> Process 'command '/usr/lib/jvm/java-11-openjdk-amd64/bin/java'' finished with non-zero exit value 137
$ qacct -j 863802
==============================================================
qname        task
hostname     tools-sgeexec-0919.tools.eqiad.wmflabs
group        tools.digero
owner        tools.digero
project      NONE
department   defaultdepartment
jobname      gradlew
jobnumber    863802
taskid       undefined
account      sge
priority     0
qsub_time    Sat Apr 10 23:01:29 2021
start_time   Sat Apr 10 23:02:16 2021
end_time     Sat Apr 10 23:04:57 2021
granted_pe   NONE
slots        1
failed       0
exit_status  1
ru_wallclock 161s
ru_utime     2.822s
ru_stime     0.660s
ru_maxrss    88.660KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    14316
ru_majflt    2686
ru_nswap     0
ru_inblock   569368
ru_oublock   1176
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     15914
ru_nivcsw    3545
cpu          325.870s
mem          1.025TBs
io           715.528MB
iow          0.000s
maxvmem      6.118GB
arid         undefined
ar_sub_time  undefined
category     -q task -l h_vmem=18874368k,release=stretch

How can I find out why the process got killed?

tools-sgeexec-0919 seems to be getting lots of OOM kills lately. This one in particular:

1[Apr10 23:05] G1 Young RemSet invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
2[ +0.000001] G1 Young RemSet cpuset=/ mems_allowed=0
3[ +0.000008] CPU: 2 PID: 29196 Comm: G1 Young RemSet Not tainted 4.19.0-0.bpo.14-amd64 #1 Debian 4.19.171-2~deb9u1
4[ +0.000001] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.12.0-1 04/01/2014
5[ +0.000001] Call Trace:
6[ +0.000047] dump_stack+0x66/0x81
7[ +0.000005] dump_header+0x6b/0x28d
8[ +0.000002] oom_kill_process+0x272/0x280
9[ +0.000006] out_of_memory+0x10c/0x490
10[ +0.000001] __alloc_pages_slowpath+0x9d8/0xdb0
11[ +0.000002] __alloc_pages_nodemask+0x258/0x2a0
12[ +0.000015] filemap_fault+0x3b8/0x780
13[ +0.000003] ? alloc_set_pte+0x3f8/0x5b0
14[ +0.000001] ? filemap_map_pages+0x18a/0x390
15[ +0.000070] ext4_filemap_fault+0x2c/0x40 [ext4]
16[ +0.000013] __do_fault+0x57/0x10c
17[ +0.000001] __handle_mm_fault+0xc46/0x1210
18[ +0.000003] ? __switch_to_asm+0x35/0x70
19[ +0.000001] ? __switch_to_asm+0x41/0x70
20[ +0.000001] handle_mm_fault+0xfc/0x210
21[ +0.000003] __do_page_fault+0x255/0x4f0
22[ +0.000003] ? exit_to_usermode_loop+0x6a/0xf0
23[ +0.000001] ? async_page_fault+0x8/0x30
24[ +0.000002] async_page_fault+0x1e/0x30
25[ +0.000008] RIP: 0033:0x15[ASLR]1820
26[ +0.000006] Code: Bad RIP value.
27[ +0.000001] RSP: 002b:000015[ASLR]ec98 EFLAGS: 00010206
28[ +0.000001] RAX: 000015[ASLR]a9f0 RBX: 000015[ASLR]1470 RCX: 0000000000000000
29[ +0.000001] RDX: 0000000000000000 RSI: 00000000000072f0 RDI: 000015[ASLR]1470
30[ +0.000000] RBP: 000015[ASLR]ecd0 R08: 0000000000000000 R09: 0000000000000178
31[ +0.000001] R10: 000015[ASLR]ed00 R11: 0000000000000206 R12: 000015[ASLR]0a18
32[ +0.000001] R13: 000015[ASLR]0a10 R14: 0000000000000000 R15: 000015[ASLR]e8b8
33[ +0.000001] Mem-Info:
34[ +0.000003] active_anon:1746150 inactive_anon:229760 isolated_anon:0
35 active_file:212 inactive_file:762 isolated_file:44
36 unevictable:0 dirty:3 writeback:0 unstable:0
37 slab_reclaimable:6991 slab_unreclaimable:9928
38 mapped:455 shmem:21138 pagetables:6710 bounce:0
39 free:25808 free_pcp:201 free_cma:0
40[ +0.000003] Node 0 active_anon:6984600kB inactive_anon:919040kB active_file:848kB inactive_file:3048kB unevictable:0kB isolated(anon):0kB isolated(file):176kB mapped:1820kB dirty:12kB
41[ +0.000000] Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB
42[ +0.000002] lowmem_reserve[]: 0 2945 7936 7936 7936
43[ +0.000002] Node 0 DMA32 free:44992kB min:25032kB low:31288kB high:37544kB active_anon:2362136kB inactive_anon:625712kB active_file:128kB inactive_file:332kB unevictable:0kB writepend
44[ +0.000002] lowmem_reserve[]: 0 0 4990 4990 4990
45[ +0.000001] Node 0 Normal free:42332kB min:42416kB low:53020kB high:63624kB active_anon:4622464kB inactive_anon:293328kB active_file:632kB inactive_file:3576kB unevictable:0kB writepe
46[ +0.000002] lowmem_reserve[]: 0 0 0 0 0
47[ +0.000001] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
48[ +0.000004] Node 0 DMA32: 480*4kB (UME) 304*8kB (UE) 193*16kB (UME) 117*32kB (UME) 68*64kB (UME) 53*128kB (UME) 29*256kB (UME) 13*512kB (UE) 9*1024kB (UME) 0*2048kB 0*4096kB = 45616kB
49[ +0.000004] Node 0 Normal: 2877*4kB (UME) 840*8kB (UME) 450*16kB (UME) 216*32kB (ME) 122*64kB (ME) 27*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43604kB
50[ +0.000004] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
51[ +0.000000] 22355 total pagecache pages
52[ +0.000020] 0 pages in swap cache
53[ +0.000001] Swap cache stats: add 0, delete 0, find 0/0
54[ +0.000001] Free swap = 25062392kB
55[ +0.000000] Total swap = 25062392kB
56[ +0.000000] 2097018 pages RAM
57[ +0.000001] 0 pages HighMem/MovableOnly
58[ +0.000000] 54542 pages reserved
59[ +0.000000] 0 pages hwpoisoned
60[ +0.000001] Tasks state (memory values in pages):
61[ +0.000000] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
62[ +0.000006] [ 259] 0 259 15621 469 135168 0 0 systemd-journal
63[ +0.000001] [ 277] 0 277 26799 112 98304 0 0 lvmetad
64[ +0.000001] [ 291] 0 291 10023 312 110592 0 -1000 systemd-udevd
65[ +0.000001] [ 460] 0 460 7661 280 90112 0 0 dhclient
66[ +0.000002] [ 663] 100 663 30241 125 139264 0 0 systemd-timesyn
67[ +0.000002] [ 665] 0 665 3005 770 69632 0 0 haveged
68[ +0.000001] [ 679] 0 679 82610 553 163840 0 0 rsyslogd
69[ +0.000001] [ 680] 110 680 10424 332 126976 0 -900 dbus-daemon
70[ +0.000001] [ 692] 0 692 51190 350 393216 0 0 sssd
71[ +0.000001] [ 693] 108 693 215961 8135 331776 0 0 prometheus-node
72[ +0.000001] [ 694] 0 694 45038 13853 389120 0 0 python
73[ +0.000001] [ 704] 117 704 6664 48 81920 0 0 oidentd
74[ +0.000001] [ 708] 0 708 13541 164 147456 0 0 lldpd
75[ +0.000001] [ 724] 0 724 5820 70 90112 0 0 cron
76[ +0.000001] [ 744] 107 744 13541 148 131072 0 0 lldpd
77[ +0.000001] [ 753] 0 753 27754 3133 180224 0 0 sge_execd
78[ +0.000001] [ 761] 106 761 16578 167 151552 0 0 exim4
79[ +0.000002] [ 770] 0 770 46754 317 372736 0 0 sssd_pam
80[ +0.000001] [ 771] 0 771 45147 293 360448 0 0 sssd_ssh
81[ +0.000001] [ 774] 0 774 10048 137 114688 0 0 systemd-logind
82[ +0.000001] [ 776] 0 776 1987 30 57344 0 0 agetty
83[ +0.000001] [ 777] 0 777 1641 27 57344 0 0 agetty
84[ +0.000001] [ 795] 0 795 178619 4586 208896 0 0 python
85[ +0.000001] [ 857] 0 857 44974 13800 376832 0 0 python
86[ +0.000001] [ 860] 0 860 44974 13818 376832 0 0 python
87[ +0.000001] [ 1131] 0 1131 44974 13821 380928 0 0 python
88[ +0.000013] [ 1265] 0 1265 44974 13831 380928 0 0 python
89[ +0.000001] [ 1267] 0 1267 44974 13831 376832 0 0 python
90[ +0.000001] [ 1270] 0 1270 44974 13837 376832 0 0 python
91[ +0.000001] [ 1274] 0 1274 44974 13842 376832 0 0 python
92[ +0.000001] [ 1280] 0 1280 45038 13850 376832 0 0 python
93[ +0.000001] [ 1288] 0 1288 45038 13860 380928 0 0 python
94[ +0.000001] [ 1360] 0 1360 12866 1136 122880 0 0 sge_shepherd
95[ +0.000001] [ 1364] 0 1364 12866 1134 135168 0 0 sge_shepherd
96[ +0.000002] [ 1369] 52871 1369 1070 24 57344 0 0 1144731
97[ +0.000001] [ 1370] 51206 1370 1070 17 57344 0 0 876023
98[ +0.000001] [ 10797] 0 10797 51883 359 417792 0 0 sssd_nss
99[ +0.000001] [ 10860] 0 10860 45300 436 360448 0 0 sssd_sudo
100[ +0.000002] [ 4954] 0 4954 7997 125 94208 0 -500 nrpe
101[ +0.000001] [ 22135] 0 22135 12866 1135 131072 0 0 sge_shepherd
102[ +0.000001] [ 22137] 51203 22137 346869 254900 2703360 0 0 bg_create_catal
103[ +0.000013] [ 17012] 0 17012 58909 1836 405504 0 0 sssd_be
104[ +0.000001] [ 19576] 51206 19576 57286 9468 245760 0 0 ircbot.py
105[ +0.000001] [ 15947] 0 15947 16445 189 172032 0 -1000 sshd
106[ +0.000008] [ 28044] 0 28044 131784 2126 163840 0 0 prometheus-rsys
107[ +0.000001] [ 24548] 0 24548 12866 1135 122880 0 0 sge_shepherd
108[ +0.000001] [ 24550] 51306 24550 36730 14003 335872 0 0 python
109[ +0.000002] [ 25551] 0 25551 12866 1135 122880 0 0 sge_shepherd
110[ +0.000001] [ 25553] 53768 25553 21072 4675 217088 0 0 python
111[ +0.000002] [ 28985] 0 28985 12866 1135 118784 0 0 sge_shepherd
112[ +0.000001] [ 28987] 54015 28987 1309466 16048 573440 0 0 java
113[ +0.000003] [ 29028] 54015 29028 869081 121911 1593344 0 0 java
114[ +0.000001] [ 29187] 54015 29187 3114856 1464697 12558336 0 0 java
115[ +0.000005] [ 29334] 52871 29334 1070 17 53248 0 0 sh
116[ +0.000001] [ 29335] 52871 29335 26921 8593 262144 0 0 python3
117[ +0.000002] [ 29341] 52871 29341 6544 286 98304 0 0 git
118[ +0.000002] [ 29349] 0 29349 14837 69 143360 0 0 sshd
119[ +0.000002] [ 29350] 0 29350 15372 67 143360 0 0 sshd
120[ +0.000002] Out of memory: Kill process 29187 (java) score 176 or sacrifice child
121[ +0.010659] Killed process 29187 (java) total-vm:12459424kB, anon-rss:5858788kB, file-rss:0kB, shmem-rss:0kB
122[ +0.254970] oom_reaper: reaped process 29187 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

[  +0.000001] Tasks state (memory values in pages):
[  +0.000000] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  +0.000001] [  29187] 54015 29187  3114856  1464697 12558336        0             0 java

1464697 pages * 4kiB per page = 5.587 GiB of resident memory. This isn't even a memory reservation issue. 5.587 GiB of memory are resident in RAM sticks. What are you doing?

Sorry if this isn't directly related to the reported bug, but is there any way to find out why a process got sigkilled on the grid? I have a Java process getting terminated without traces of hserr*.log, and I suspect it is related to some native libraries allocating too much off-heap memory. I'd like to confirm that it's due to the OOM killer (or perhaps quota related?)

I should clarify this. This SIGKILL logged above has nothing to do with any quota or any configuration to the job submission you have set. The log above indicates the grid exec node you ran the job on has literally ran out of physical memory. The trace above the OS kernel receiving a page fault exception and then failing to find physical memory for use for an mmap-ed file virtual memory area. Exceptions like these are not recoverable by throwing an error code, so either the system has to crash, or the OS must free some physical memory by killing processes immediately.

The OS uses some heuristics to determine which process to kill given such a circumstance, and in this case it killed the process using by far the most resident physical memory, your java process.

Each grid exec host has 7.8GiB physical memory total. Your three java processes accounts for 6.11GiB = 78.5%. Hence I'm asking, what are you doing that needs so much physical memory?

Thanks for investigating. I'm running a Spark job to parse some CBOR files (~ 100MB) to generate a list of missing words on Wiktionary. It really should not take so much memory but I noticed in the logs that Spark seems to aggressively request a lot of memory up front as some sort of buffer space/working memory. I'll see if I can tweak this. Update: solved.

aborrero subscribed.

We no longer have the Grid Engine backend in Toolforge.