Page MenuHomePhabricator

mono-based bot hangs after mono version upgrade
Closed, ResolvedPublic

Description

T194665 resulted in my bot sometimes getting stuck in an HTTPS request to the WP API:

"<unnamed thread>" tid=0x0x2b0219952640 this=0x0x2b0219a70130 , thread handle : 0x1322210, state : waiting
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Threading.Monitor.Monitor_wait (object,int) [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Monitor.ObjWait (bool,int,object) [0x0002f] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Monitor.Wait (object,int,bool) [0x0000e] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Monitor.Wait (object,int) [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.ManualResetEventSlim.Wait (int,System.Threading.CancellationToken) [0x00141] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Tasks.Task.SpinThenBlockingWait (int,System.Threading.CancellationToken) [0x0002d] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Tasks.Task.InternalWait (int,System.Threading.CancellationToken) [0x00030] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Tasks.Task`1<TResult_REF>.GetResultCore (bool) [0x00008] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Threading.Tasks.Task`1<TResult_REF>.get_Result () [0x0000f] in <71d8ad678db34313b7f718a414dfcb25>:0
  at System.Net.HttpWebRequest.GetRequestStream () [0x00006] in <fc308f916aec4e4283e0c1d4b761760a>:0
  at Browser.Post (string,System.Collections.Generic.IDictionary`2<string, string>) [0x00029] in <13064408b38543998806c4f907e2d3cf>:0
  at MediaWiki.DoExec (System.Collections.Generic.IDictionary`2<string, string>) [0x00023] in <13064408b38543998806c4f907e2d3cf>:0
  at MediaWiki.Exec (System.Collections.Generic.IDictionary`2<string, string>) [0x00013] in <13064408b38543998806c4f907e2d3cf>:0
  at MediaWiki.DoLogin (string,string,string) [0x0003e] in <13064408b38543998806c4f907e2d3cf>:0
  at MediaWiki.Login (string,string) [0x00008] in <13064408b38543998806c4f907e2d3cf>:0
  at ChieBot.Stabilization.ITNSModule.Execute (MediaWiki,string[],ChieBot.Credentials) [0x0000e] in <13064408b38543998806c4f907e2d3cf>:0
  at ChieBot.Modules.Modules/<>c__DisplayClass5_0.<Bind>b__0 (MediaWiki,string[],ChieBot.Credentials) [0x00012] in <13064408b38543998806c4f907e2d3cf>:0
  at ChieBot.Program.Main (string[]) [0x00101] in <13064408b38543998806c4f907e2d3cf>:0
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) [0x0004e] in <13064408b38543998806c4f907e2d3cf>:0

The bot eats CPU (spins) and consumes all the provided memory.

This never happens if I manually run the bot, it only gets stuck when I run it on the grid via jsub.

Event Timeline

Using the method documented (by @aborrero, thanks!) at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Mono#Debugging_a_Mono_tool, here are the C and Mono stack traces of all threads on a different host: P7172 (raw output)

1Thread 8 (Thread 0x2ab9ac2b3700 (LWP 3980)):
2#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
3#1 0x00000000006bfef3 in mono_os_cond_wait (mutex=0xa2bba0 <lock>, cond=0xa2bb60 <work_cond>) at ../../mono/utils/mono-os-mutex.h:173
4#2 get_work (job=<synthetic pointer>, do_idle=<synthetic pointer>, work_context=<synthetic pointer>, worker_index=0)
5 at sgen-thread-pool.c:165
6#3 thread_func (data=<optimized out>) at sgen-thread-pool.c:196
7#4 0x00002ab9a9c2c184 in start_thread (arg=0x2ab9ac2b3700) at pthread_create.c:312
8#5 0x00002ab9aa15603d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
9
10* Assertion at threads.c:1809, condition `internal' not met
11Program received signal SIGABRT, Aborted.
120x00002ab9aa08ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
1356 in ../nptl/sysdeps/unix/sysv/linux/raise.c
14The program being debugged was signaled while in a function called from GDB.
15GDB has restored the context to what it was before the call.
16To change this behavior use "set unwindonsignal off".
17Evaluation of the expression containing the function
18(mono_thread_current) will be abandoned.
19
20Thread 7 (Thread 0x2ab9ae563700 (LWP 3981)):
21#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
22#1 0x000000000065bf3c in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xa1cc40 <finalizer_sem>)
23 at ../../mono/utils/mono-os-semaphore.h:209
24#2 mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xa1cc40 <finalizer_sem>) at ../../mono/utils/mono-coop-semaphore.h:43
25#3 finalizer_thread (unused=unused@entry=0x0) at gc.c:893
26#4 0x0000000000615a33 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1071
27#5 start_wrapper (data=0x1ff5f00) at threads.c:1131
28#6 0x00002ab9a9c2c184 in start_thread (arg=0x2ab9ae563700) at pthread_create.c:312
29#7 0x00002ab9aa15603d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
30
31"Finalizer" tid=0x0x2ab9ae563700 this=0x0x2ab9a9444278 , thread handle : 0x2ab9b0000f40, state : not waiting
32
33Thread 6 (Thread 0x2ab9aef00700 (LWP 3996)):
34#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
35#1 0x00002ab9a9c2e649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
36#2 0x00002ab9a9c2e470 in __GI___pthread_mutex_lock (mutex=0xa1d638 <worker+120>) at ../nptl/pthread_mutex_lock.c:79
37#3 0x000000000066ae91 in mono_os_mutex_lock (mutex=0xa1d638 <worker+120>) at ../../mono/utils/mono-os-mutex.h:99
38#4 mono_coop_mutex_lock (mutex=0xa1d638 <worker+120>) at ../../mono/utils/mono-coop-mutex.h:56
39#5 worker_try_create () at threadpool-worker-default.c:520
40#6 0x000000000066b045 in worker_request () at threadpool-worker-default.c:598
41#7 0x000000000066b9e8 in mono_threadpool_worker_request () at threadpool-worker-default.c:354
42#8 0x0000000000618275 in ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:802
43#9 0x00000000413bb219 in ?? ()
44#10 0x00002ab9ac4db9e0 in ?? ()
45#11 0x0000000000000000 in ?? ()
46
47"Timer-Scheduler" tid=0x0x2ab9aef00700 this=0x0x2ab9a94443c0 , thread handle : 0x2ab9b4000f40, state : not waiting
48 at <unknown> <0xffffffff>
49 at (wrapper managed-to-native) System.Threading.ThreadPool.RequestWorkerThread () [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
50 at System.Threading.ThreadPoolWorkQueue.EnsureThreadRequested () [0x0001f] in <71d8ad678db34313b7f718a414dfcb25>:0
51 at System.Threading.ThreadPoolWorkQueue.Enqueue (System.Threading.IThreadPoolWorkItem,bool) [0x00071] in <71d8ad678db34313b7f718a414dfcb25>:0
52 at System.Threading.ThreadPool.QueueUserWorkItemHelper (System.Threading.WaitCallback,object,System.Threading.StackCrawlMark&,bool) [0x00016] in <71d8ad678db34313b7f718a414dfcb25>:0
53 at System.Threading.ThreadPool.UnsafeQueueUserWorkItem (System.Threading.WaitCallback,object) [0x00002] in <71d8ad678db34313b7f718a414dfcb25>:0
54 at System.Threading.Timer/Scheduler.SchedulerThread () [0x0008a] in <71d8ad678db34313b7f718a414dfcb25>:0
55 at System.Threading.ThreadHelper.ThreadStart_Context (object) [0x00014] in <71d8ad678db34313b7f718a414dfcb25>:0
56 at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00071] in <71d8ad678db34313b7f718a414dfcb25>:0
57 at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
58 at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) [0x0002b] in <71d8ad678db34313b7f718a414dfcb25>:0
59 at System.Threading.ThreadHelper.ThreadStart () [0x00008] in <71d8ad678db34313b7f718a414dfcb25>:0
60 at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) [0x0004d] in <71d8ad678db34313b7f718a414dfcb25>:0
61
62Thread 5 (Thread 0x2ab9af302700 (LWP 3998)):
63#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
64#1 0x0000000000613a2c in mono_os_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x2ab9bc0bfb68) at ../../mono/utils/mono-os-semaphore.h:209
65#2 mono_coop_sem_wait (flags=MONO_SEM_FLAGS_NONE, sem=0x2ab9bc0bfb68) at ../../mono/utils/mono-coop-semaphore.h:43
66#3 create_thread (thread=thread@entry=0x2ab9a94404a0, internal=internal@entry=0x2ab9a9445090, start_delegate=start_delegate@entry=0x0,
67 start_func=start_func@entry=0x66a270 <worker_thread>, start_func_arg=start_func_arg@entry=0x0,
68 flags=flags@entry=MONO_THREAD_CREATE_FLAGS_THREADPOOL, error=error@entry=0x2ab9af301710) at threads.c:1231
69#4 0x0000000000613f0f in mono_thread_create_internal (domain=<optimized out>, func=func@entry=0x66a270 <worker_thread>, arg=arg@entry=0x0,
70 flags=flags@entry=MONO_THREAD_CREATE_FLAGS_THREADPOOL, error=error@entry=0x2ab9af301710) at threads.c:1295
71#5 0x000000000066aceb in worker_try_create () at threadpool-worker-default.c:554
72#6 0x000000000066b045 in worker_request () at threadpool-worker-default.c:598
73#7 0x000000000066b9e8 in mono_threadpool_worker_request () at threadpool-worker-default.c:354
74#8 0x0000000000618275 in ves_icall_System_Threading_ThreadPool_RequestWorkerThread () at threadpool.c:802
75#9 0x00000000413bb219 in ?? ()
76#10 0x00002ab9af301cb8 in ?? ()
77#11 0x00002ab9bc002610 in ?? ()
78#12 0x0000000000000003 in ?? ()
79#13 0x0000000000000003 in ?? ()
80#14 0x00002ab9ac4d4148 in ?? ()
81#15 0x00002ab9bc002580 in ?? ()
82#16 0x00002ab9af301a20 in ?? ()
83#17 0x00002ab9af3018d0 in ?? ()
84#18 0x0000000001fbf250 in ?? ()
85#19 0x00002ab9ade40edd in System_Threading_ThreadPoolWorkQueue_EnsureThreadRequested (this=...)
86 from /usr/lib/mono/aot-cache/amd64/mscorlib.dll.so
87#20 0x00002ab9ade414e0 in System_Threading_ThreadPoolWorkQueue_Dispatch () from /usr/lib/mono/aot-cache/amd64/mscorlib.dll.so
88#21 0x00002ab9ade43039 in System_Threading__ThreadPoolWaitCallback_PerformWaitCallback () from /usr/lib/mono/aot-cache/amd64/mscorlib.dll.so
89#22 0x000000000001ea00 in ?? ()
90#23 0x00000000413bb63b in ?? ()
91#24 0x0000000000000001 in ?? ()
92#25 0x00002ab9af301cb8 in ?? ()
93#26 0x00002ab9a9327678 in ?? ()
94#27 0x0000000000000000 in ?? ()
95
96"Thread Pool Worker" tid=0x0x2ab9af302700 this=0x0x2ab9a9444650 , thread handle : 0x2ab9bc000f40, state : not waiting
97 at <unknown> <0xffffffff>
98 at (wrapper managed-to-native) System.Threading.ThreadPool.RequestWorkerThread () [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
99 at System.Threading.ThreadPoolWorkQueue.EnsureThreadRequested () [0x0001f] in <71d8ad678db34313b7f718a414dfcb25>:0
100 at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x0003a] in <71d8ad678db34313b7f718a414dfcb25>:0
101 at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
102 at (wrapper runtime-invoke) <Module>.runtime_invoke_bool (object,intptr,intptr,intptr) [0x0001e] in <71d8ad678db34313b7f718a414dfcb25>:0
103
104Thread 4 (Thread 0x2ab9c4200700 (LWP 4000)):
105#0 0x00002ab9aa148c9d in poll () at ../sysdeps/unix/syscall-template.S:81
106#1 0x00000000006ce777 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>)
107 at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
108#2 mono_poll (ufds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at mono-poll.c:30
109#3 0x0000000000618b52 in poll_event_wait (callback=0x619230 <wait_callback>, user_data=0x2ab9c8002610) at threadpool-io-poll.c:146
110#4 0x0000000000619ed1 in selector_thread (data=data@entry=0x0) at threadpool-io.c:451
111#5 0x0000000000615a33 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1071
112#6 start_wrapper (data=0x2ab9bc0610a0) at threads.c:1131
113#7 0x00002ab9a9c2c184 in start_thread (arg=0x2ab9c4200700) at pthread_create.c:312
114#8 0x00002ab9aa15603d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
115
116"Thread Pool I/O Selector" tid=0x0x2ab9c4200700 this=0x0x2ab9a94448e0 , thread handle : 0x2ab9c8000f40, state : waiting
117
118Thread 3 (Thread 0x2ab9c517d700 (LWP 4006)):
119#0 0x00000000006de5e0 in monoeg_g_calloc (n=n@entry=1, x=1656) at gmem.c:117
120#1 0x00000000006de5fd in monoeg_malloc0 (x=<optimized out>) at gmem.c:121
121#2 0x00000000006d54b4 in mono_thread_info_attach () at mono-threads.c:649
122#3 0x0000000000615942 in start_wrapper (data=0x2ab9bc0bfb30) at threads.c:1127
123#4 0x00002ab9a9c2c184 in start_thread (arg=0x2ab9c517d700) at pthread_create.c:312
124#5 0x00002ab9aa15603d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
125
126* Assertion at threads.c:1809, condition `internal' not met
127Program received signal SIGABRT, Aborted.
1280x00002ab9aa08ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
12956 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
130The program being debugged was signaled while in a function called from GDB.
131GDB has restored the context to what it was before the call.
132To change this behavior use "set unwindonsignal off".
133Evaluation of the expression containing the function
134(mono_thread_current) will be abandoned
135
136Thread 2 (Thread 0x2ab9af101700 (LWP 4134)):
137#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
138#1 0x00002ab9a9c2e649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
139#2 0x00002ab9a9c2e470 in __GI___pthread_mutex_lock (mutex=0xa1d638 <worker+120>) at ../nptl/pthread_mutex_lock.c:79
140#3 0x000000000066ae91 in mono_os_mutex_lock (mutex=0xa1d638 <worker+120>) at ../../mono/utils/mono-os-mutex.h:99
141#4 mono_coop_mutex_lock (mutex=0xa1d638 <worker+120>) at ../../mono/utils/mono-coop-mutex.h:56
142#5 worker_try_create () at threadpool-worker-default.c:520
143#6 0x000000000066b387 in monitor_thread (unused=unused@entry=0x0) at threadpool-worker-default.c:749
144#7 0x0000000000615a33 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1071
145#8 start_wrapper (data=0x2ab9b4004130) at threads.c:1131
146#9 0x00002ab9a9c2c184 in start_thread (arg=0x2ab9af101700) at pthread_create.c:312
147#10 0x00002ab9aa15603d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
148
149"<threadpool thread>" tid=0x0x2ab9af101700 this=0x0x2ab9a94451d8 , thread handle : 0x2ab9b8000f90, state : not waiting
150
151Thread 1 (Thread 0x2ab9a9325640 (LWP 3979)):
152#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
153#1 0x00000000006cc5f5 in mono_os_cond_wait (mutex=0x1f996b8, cond=0x1f996e0) at mono-os-mutex.h:173
154#2 mono_os_cond_timedwait (cond=cond@entry=0x1f996e0, mutex=mutex@entry=0x1f996b8, timeout_ms=timeout_ms@entry=4294967295)
155 at mono-os-mutex.c:32
156#3 0x000000000062c875 in mono_coop_cond_timedwait (timeout_ms=4294967295, mutex=0x1f996b8, cond=0x1f996e0)
157 at ../../mono/utils/mono-coop-mutex.h:102
158#4 mono_w32handle_timedwait_signal_naked (alerted=0x7ffd2bda6108, poll=0, timeout=4294967295, mutex=0x1f996b8, cond=0x1f996e0)
159 at w32handle.c:646
160#5 mono_w32handle_timedwait_signal_handle (handle_data=0x1f996a8, timeout=timeout@entry=4294967295, alerted=alerted@entry=0x7ffd2bda6108,
161 poll=0) at w32handle.c:761
162#6 0x000000000062ce55 in mono_w32handle_wait_one (handle=handle@entry=0x1f996a8, timeout=timeout@entry=4294967295,
163 alertable=alertable@entry=1) at w32handle.c:869
164#7 0x000000000065fc2f in ves_icall_System_Threading_Monitor_Monitor_wait (obj=0x2ab9ac508df0, ms=4294967295) at monitor.c:1394
165#8 0x00000000413bb577 in ?? ()
166#9 0x00007ffd2bda6320 in ?? ()
167#10 0x0000000000000000 in ?? ()
168
169"<unnamed thread>" tid=0x0x2ab9a9325640 this=0x0x2ab9a9444130 , thread handle : 0x1fa3210, state : waiting
170 at <unknown> <0xffffffff>
171 at (wrapper managed-to-native) System.Threading.Monitor.Monitor_wait (object,int) [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
172 at System.Threading.Monitor.ObjWait (bool,int,object) [0x0002f] in <71d8ad678db34313b7f718a414dfcb25>:0
173 at System.Threading.Monitor.Wait (object,int,bool) [0x0000e] in <71d8ad678db34313b7f718a414dfcb25>:0
174 at System.Threading.Monitor.Wait (object,int) [0x00000] in <71d8ad678db34313b7f718a414dfcb25>:0
175 at System.Threading.ManualResetEventSlim.Wait (int,System.Threading.CancellationToken) [0x00141] in <71d8ad678db34313b7f718a414dfcb25>:0
176 at System.Threading.Tasks.Task.SpinThenBlockingWait (int,System.Threading.CancellationToken) [0x0002d] in <71d8ad678db34313b7f718a414dfcb25>:0
177 at System.Threading.Tasks.Task.InternalWait (int,System.Threading.CancellationToken) [0x00030] in <71d8ad678db34313b7f718a414dfcb25>:0
178 at System.Threading.Tasks.Task`1<int>.GetResultCore (bool) [0x00008] in <71d8ad678db34313b7f718a414dfcb25>:0
179 at System.Threading.Tasks.Task`1<int>.get_Result () [0x0000f] in <71d8ad678db34313b7f718a414dfcb25>:0
180 at System.Net.WebConnectionStream.Read (byte[],int,int) [0x00067] in <fc308f916aec4e4283e0c1d4b761760a>:0
181 at System.IO.StreamReader.ReadBuffer () [0x000b3] in <71d8ad678db34313b7f718a414dfcb25>:0
182 at System.IO.StreamReader.ReadToEnd () [0x00052] in <71d8ad678db34313b7f718a414dfcb25>:0
183 at Browser.GetStringResponse (System.Net.HttpWebRequest) [0x0001e] in <13064408b38543998806c4f907e2d3cf>:0
184 at Browser.Post (string,System.Collections.Generic.IDictionary`2<string, string>) [0x00057] in <13064408b38543998806c4f907e2d3cf>:0
185 at MediaWiki.DoExec (System.Collections.Generic.IDictionary`2<string, string>) [0x00023] in <13064408b38543998806c4f907e2d3cf>:0
186 at MediaWiki.Exec (System.Collections.Generic.IDictionary`2<string, string>) [0x00013] in <13064408b38543998806c4f907e2d3cf>:0
187 at MediaWiki.DoLogin (string,string,string) [0x0003e] in <13064408b38543998806c4f907e2d3cf>:0
188 at MediaWiki.Login (string,string) [0x00008] in <13064408b38543998806c4f907e2d3cf>:0
189 at ChieBot.DYK.DYKCheckerModule.Execute (MediaWiki,string[],ChieBot.Credentials) [0x0000e] in <13064408b38543998806c4f907e2d3cf>:0
190 at ChieBot.Modules.Modules/<>c__DisplayClass5_0.<Bind>b__0 (MediaWiki,string[],ChieBot.Credentials) [0x00012] in <13064408b38543998806c4f907e2d3cf>:0
191 at ChieBot.Program.Main (string[]) [0x00101] in <13064408b38543998806c4f907e2d3cf>:0
192 at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) [0x0004e] in <13064408b38543998806c4f907e2d3cf>:0
193
194
195The aborting bt is similar to (done without `set unwindonsignal on` on a different host on a different process):
196Thread 8 (Thread 0x2b021ce00700 (LWP 21730)):
197#0 0x00002b021a6bbc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
198#1 0x00002b021a6bf028 in __GI_abort () at abort.c:89
199#2 0x00000000006ca159 in mono_log_write_logfile (log_domain=<optimized out>, level=<optimized out>, hdr=<optimized out>,
200 message=0x2b023c010ae0 "* Assertion at threads.c:1809, condition `internal' not met\n") at mono-log-common.c:135
201#3 0x00000000006de8e0 in monoeg_g_logv (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_ERROR,
202 format=format@entry=0x6e7c58 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x2b021cdffcb8) at goutput.c:115
203#4 0x00000000006dea36 in monoeg_assertion_message (format=format@entry=0x6e7c58 "* Assertion at %s:%d, condition `%s' not met\n")
204 at goutput.c:135
205#5 0x000000000060f3f7 in mono_thread_current () at threads.c:1809
206#6 <function called from gdb>
207#7 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
208#8 0x00000000006bfef3 in mono_os_cond_wait (mutex=0xa2bba0 <lock>, cond=0xa2bb60 <work_cond>) at ../../mono/utils/mono-os-mutex.h:173
209#9 get_work (job=<synthetic pointer>, do_idle=<synthetic pointer>, work_context=<synthetic pointer>, worker_index=0)
210 at sgen-thread-pool.c:165
211#10 thread_func (data=<optimized out>) at sgen-thread-pool.c:196
212#11 0x00002b021a259184 in start_thread (arg=0x2b021ce00700) at pthread_create.c:312
213#12 0x00002b021a78303d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

On PID 4006, break monoeg_g_calloc then c, it does not break after 10+ seconds. Yes it is very likely that it is calloc spinlocking due to insufficient memory.

But why do I feel the assembly means while (1) {} P7174

It seems to me there is a high likelihood this is just T150099 striking again.

Yep, confirming insufficent memory after comparing the disassembled code and C source code.

monoeg_g_calloc maps to [[https://github.com/mono/mono/blob/9be68f8952ea0e1aad582bfe2f47bad71aee2cc7/mono/eglib/gmem.c#L109|g_calloc]]:

gpointer g_calloc (gsize n, gsize x)
{
	gpointer ptr;
	if (!x || !n)
		return 0;
		ptr = G_CALLOC_INTERNAL (n, x);
	if (ptr)
		return ptr;
	g_error ("Could not allocate %i (%i * %i) bytes", x*n, n, x);
}

[[https://github.com/mono/mono/blob/d8af77551de1f59803a46946786e0f9a5eb5cb9d/mono/eglib/glib.h#L608|g_error]] is defined macro:

#define g_error(...) do { g_log (G_LOG_DOMAIN, G_LOG_LEVEL_ERROR, __VA_ARGS__); for (;;); } while (0)

This corresponds to both the assembly above:

   0x00000000006de5db <+91>:	callq  0x6de910 <monoeg_g_log>
=> 0x00000000006de5e0 <+96>: jmp 0x6de5e0 <monoeg_g_calloc+96>

The error should have been Could not allocate ... bytes.

Thanks @Chicocvenancio for linking to that ticket (and reminding me of it).

@Kf8 since grid engine calculates memory usage by allocated size (mmap) instead of resident size, feel free to supply a high value for -mem

The use of g_erroron calloc failure started at least 8 years ago, and g_error having infinite loop also started 8 years ago, with earliest version applied as 2.8. No idea why this specific program like 3.2.8 but not 5.12.

Let's see if this is fixed by suppling a higher -mem. If so, good (hopefully it's that we used to have thinner libraries). If not, there's a memory leak and we can debug further.

Waiting for @Kf8 to confirm whether increasing the memory works or not.

No issues since I've increased the memory limit. My guess is that the new mono runtime requires a little bit more memory than the previous version, that's why 900m wasn't enough anymore. Thanks to everybody involved!

P.S. I see a lot of mono-sgen processes on grid machines consuming CPU. Somebody should probably have look at them.

P.S. I see a lot of mono-sgen processes on grid machines consuming CPU. Somebody should probably have look at them.

Basically the same issue. PID 15885 on tools-exec-1441:

(gdb) bt
#0  0x00000000006de5e0 in monoeg_g_calloc (n=n@entry=1, x=1656) at gmem.c:117
#1  0x00000000006de5fd in monoeg_malloc0 (x=<optimized out>) at gmem.c:121
#2  0x00000000006d54b4 in mono_thread_info_attach () at mono-threads.c:649
#3  0x0000000000615942 in start_wrapper (data=0x2b5e6c0050e0) at threads.c:1127
#4  0x00002b5e492d3184 in start_thread (arg=0x2b5e52893700) at pthread_create.c:312
#5  0x00002b5e497fd03d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thinking of replacing that infinite loop with abortion.

Can someone mail the cloud mailing list about this since it affects quite a lot of tools, or shall I try to figure out a list of affected tools?

Affected tools: P7180 (raw output)

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/linux$ cat T195834 | sed '/Not found/ { N; d }' | grep 'tools\.' | awk '{ print $1 }' | sort | uniq -c
    138 tools.dibot
     23 tools.mbh

CC @Dmitry89 @MaxBioHazard

Mentioned in SAL (#wikimedia-cloud) [2018-05-30T05:20:59Z] <zhuyifei1999_> qdel all grid jobs except inc_remindbot & lighttpd-dibot because they are just stuck in an infinite busy loop and consume grid CPU doing nothing T195834

1tools.dibot@tools-bastion-05:~$ qstat
2job-ID prior name user state submit/start at queue slots ja-task-ID
3-----------------------------------------------------------------------------------------------------------------
4 207071 0.39983 inc_remind tools.dibot r 01/23/2018 03:25:09 task@tools-exec-1415.tools.eqi 1
5 248025 0.39905 inc_remind tools.dibot r 01/24/2018 03:25:06 task@tools-exec-1420.tools.eqi 1
6 287975 0.39826 inc_remind tools.dibot r 01/25/2018 03:25:06 task@tools-exec-1426.tools.eqi 1
7 329094 0.39747 inc_remind tools.dibot r 01/26/2018 03:25:06 task@tools-exec-1436.tools.eqi 1
8 369230 0.39669 inc_remind tools.dibot r 01/27/2018 03:25:10 task@tools-exec-1429.tools.eqi 1
9 410400 0.39590 inc_remind tools.dibot r 01/28/2018 03:25:05 task@tools-exec-1402.eqiad.wmf 1
10 450866 0.39512 inc_remind tools.dibot r 01/29/2018 03:25:02 task@tools-exec-1412.tools.eqi 1
112518218 0.35869 lighttpd-d tools.dibot r 03/16/2018 12:15:06 webgrid-lighttpd@tools-webgrid 1
126488557 0.30133 pats-gadge tools.dibot r 05/28/2018 12:40:13 task@tools-exec-1411.tools.eqi 1
136489562 0.30132 inc_check tools.dibot r 05/28/2018 13:05:08 task@tools-exec-1424.tools.eqi 1
146489867 0.30131 inc_redire tools.dibot r 05/28/2018 13:15:10 task@tools-exec-1406.eqiad.wmf 1
156490154 0.30131 filemoves_ tools.dibot r 05/28/2018 13:21:03 task@tools-exec-1406.eqiad.wmf 1
166490713 0.30130 pats-gadge tools.dibot r 05/28/2018 13:40:11 task@tools-exec-1423.tools.eqi 1
176491759 0.30128 inc_check tools.dibot r 05/28/2018 14:05:11 task@tools-exec-1411.tools.eqi 1
186491760 0.30128 inc_main tools.dibot r 05/28/2018 14:05:11 task@tools-exec-1404.eqiad.wmf 1
196492339 0.30127 filemoves_ tools.dibot r 05/28/2018 14:21:03 task@tools-exec-1419.tools.eqi 1
206492881 0.30126 pats-gadge tools.dibot r 05/28/2018 14:40:12 task@tools-exec-1431.tools.eqi 1
216493938 0.30125 inc_check tools.dibot r 05/28/2018 15:05:11 task@tools-exec-1407.eqiad.wmf 1
226494514 0.30124 filemoves_ tools.dibot r 05/28/2018 15:21:07 task@tools-exec-1424.tools.eqi 1
236495065 0.30123 pats-gadge tools.dibot r 05/28/2018 15:40:12 task@tools-exec-1415.tools.eqi 1
246496094 0.30122 inc_check tools.dibot r 05/28/2018 16:05:09 task@tools-exec-1432.tools.eqi 1
256496710 0.30121 filemoves_ tools.dibot r 05/28/2018 16:21:04 task@tools-exec-1429.tools.eqi 1
266497262 0.30120 pats-gadge tools.dibot r 05/28/2018 16:40:13 task@tools-exec-1434.tools.eqi 1
276498283 0.30119 inc_check tools.dibot r 05/28/2018 17:05:11 task@tools-exec-1429.tools.eqi 1
286498871 0.30118 filemoves_ tools.dibot r 05/28/2018 17:21:03 task@tools-exec-1402.eqiad.wmf 1
296499422 0.30117 pats-gadge tools.dibot r 05/28/2018 17:40:13 task@tools-exec-1405.eqiad.wmf 1
306500447 0.30115 inc_mritog tools.dibot r 05/28/2018 18:05:10 task@tools-exec-1422.tools.eqi 1
316500457 0.30115 inc_check tools.dibot r 05/28/2018 18:05:10 task@tools-exec-1426.tools.eqi 1
326501041 0.30114 filemoves_ tools.dibot r 05/28/2018 18:21:06 task@tools-exec-1417.tools.eqi 1
336501602 0.30113 pats-gadge tools.dibot r 05/28/2018 18:40:13 task@tools-exec-1442.tools.eqi 1
346502610 0.30112 inc_check tools.dibot r 05/28/2018 19:05:13 task@tools-exec-1423.tools.eqi 1
356503199 0.30111 filemoves_ tools.dibot r 05/28/2018 19:21:04 task@tools-exec-1412.tools.eqi 1
366503751 0.30110 pats-gadge tools.dibot r 05/28/2018 19:40:14 task@tools-exec-1424.tools.eqi 1
376504780 0.30109 inc_check tools.dibot r 05/28/2018 20:05:12 task@tools-exec-1407.eqiad.wmf 1
386505360 0.30108 filemoves_ tools.dibot r 05/28/2018 20:21:04 task@tools-exec-1419.tools.eqi 1
396505923 0.30107 pats-gadge tools.dibot r 05/28/2018 20:40:13 task@tools-exec-1409.eqiad.wmf 1
406506950 0.30105 inc_check tools.dibot r 05/28/2018 21:05:11 task@tools-exec-1432.tools.eqi 1
416507529 0.30105 filemoves_ tools.dibot r 05/28/2018 21:21:04 task@tools-exec-1415.tools.eqi 1
426508080 0.30104 pats-gadge tools.dibot r 05/28/2018 21:40:12 task@tools-exec-1410.eqiad.wmf 1
436509106 0.30102 inc_check tools.dibot r 05/28/2018 22:05:11 task@tools-exec-1418.tools.eqi 1
446509690 0.30101 filemoves_ tools.dibot r 05/28/2018 22:21:05 task@tools-exec-1439.tools.eqi 1
456510226 0.30100 pats-gadge tools.dibot r 05/28/2018 22:40:12 task@tools-exec-1433.tools.eqi 1
466511251 0.30099 inc_check tools.dibot r 05/28/2018 23:05:11 task@tools-exec-1422.tools.eqi 1
476511827 0.30098 filemoves_ tools.dibot r 05/28/2018 23:21:04 task@tools-exec-1429.tools.eqi 1
486512363 0.30097 pats-gadge tools.dibot r 05/28/2018 23:40:13 task@tools-exec-1401.eqiad.wmf 1
496513535 0.30096 inc_check tools.dibot r 05/29/2018 00:05:09 task@tools-exec-1438.tools.eqi 1
506513536 0.30096 inc_mritog tools.dibot r 05/29/2018 00:05:09 task@tools-exec-1427.tools.eqi 1
516513957 0.30095 inc_redire tools.dibot r 05/29/2018 00:15:14 task@tools-exec-1402.eqiad.wmf 1
526514237 0.30095 filemoves_ tools.dibot r 05/29/2018 00:21:03 task@tools-exec-1409.eqiad.wmf 1
536514832 0.30094 pats-gadge tools.dibot r 05/29/2018 00:40:13 task@tools-exec-1403.eqiad.wmf 1
546515837 0.30093 statbot tools.dibot r 05/29/2018 01:01:11 task@tools-exec-1427.tools.eqi 1
556515921 0.30092 inc_check tools.dibot r 05/29/2018 01:05:08 task@tools-exec-1440.tools.eqi 1
566516563 0.30091 filemoves_ tools.dibot r 05/29/2018 01:21:05 task@tools-exec-1422.tools.eqi 1
576517167 0.30090 pats-gadge tools.dibot r 05/29/2018 01:40:11 task@tools-exec-1440.tools.eqi 1
586518219 0.30089 inc_check tools.dibot r 05/29/2018 02:05:11 task@tools-exec-1420.tools.eqi 1
596518713 0.30088 inc_image tools.dibot r 05/29/2018 02:20:11 task@tools-exec-1414.tools.eqi 1
606518826 0.30088 filemoves_ tools.dibot r 05/29/2018 02:21:04 task@tools-exec-1404.eqiad.wmf 1
616519426 0.30087 pats-gadge tools.dibot r 05/29/2018 02:40:13 task@tools-exec-1410.eqiad.wmf 1
626520526 0.30086 inc_check tools.dibot r 05/29/2018 03:05:11 task@tools-exec-1439.tools.eqi 1
636520530 0.30086 inc_main tools.dibot r 05/29/2018 03:05:12 task@tools-exec-1418.tools.eqi 1
646521145 0.30085 filemoves_ tools.dibot r 05/29/2018 03:21:03 task@tools-exec-1414.tools.eqi 1
656521221 0.30085 inc_remind tools.dibot r 05/29/2018 03:25:07 task@tools-exec-1408.eqiad.wmf 1
666521710 0.30084 pats-gadge tools.dibot r 05/29/2018 03:40:13 task@tools-exec-1406.eqiad.wmf 1
676522792 0.30083 inc_check tools.dibot r 05/29/2018 04:05:10 task@tools-exec-1412.tools.eqi 1
686523406 0.30082 filemoves_ tools.dibot r 05/29/2018 04:21:04 task@tools-exec-1439.tools.eqi 1
696523968 0.30081 pats-gadge tools.dibot r 05/29/2018 04:40:12 task@tools-exec-1428.tools.eqi 1
706525639 0.30078 filemoves_ tools.dibot r 05/29/2018 05:21:05 task@tools-exec-1403.eqiad.wmf 1
716526223 0.30077 pats-gadge tools.dibot r 05/29/2018 05:40:16 task@tools-exec-1435.tools.eqi 1
726527283 0.30076 inc_check tools.dibot r 05/29/2018 06:05:12 task@tools-exec-1430.tools.eqi 1
736527284 0.30076 inc_mritog tools.dibot r 05/29/2018 06:05:12 task@tools-exec-1412.tools.eqi 1
746527889 0.30075 filemoves_ tools.dibot r 05/29/2018 06:21:05 task@tools-exec-1410.eqiad.wmf 1
756528454 0.30074 pats-gadge tools.dibot r 05/29/2018 06:40:13 task@tools-exec-1426.tools.eqi 1
766529540 0.30073 inc_check tools.dibot r 05/29/2018 07:05:12 task@tools-exec-1402.eqiad.wmf 1
776530136 0.30072 filemoves_ tools.dibot r 05/29/2018 07:21:03 task@tools-exec-1418.tools.eqi 1
786530704 0.30071 pats-gadge tools.dibot r 05/29/2018 07:40:13 task@tools-exec-1434.tools.eqi 1
796531753 0.30069 inc_check tools.dibot r 05/29/2018 08:05:09 task@tools-exec-1403.eqiad.wmf 1
806532391 0.30069 filemoves_ tools.dibot r 05/29/2018 08:21:03 task@tools-exec-1417.tools.eqi 1
816532989 0.30068 pats-gadge tools.dibot r 05/29/2018 08:40:13 task@tools-exec-1434.tools.eqi 1
826534046 0.30066 inc_check tools.dibot r 05/29/2018 09:05:12 task@tools-exec-1407.eqiad.wmf 1
836534686 0.30065 filemoves_ tools.dibot r 05/29/2018 09:21:05 task@tools-exec-1426.tools.eqi 1
846535277 0.30064 pats-gadge tools.dibot r 05/29/2018 09:40:13 task@tools-exec-1433.tools.eqi 1
856536356 0.30063 inc_check tools.dibot r 05/29/2018 10:05:11 task@tools-exec-1409.eqiad.wmf 1
866536991 0.30062 filemoves_ tools.dibot r 05/29/2018 10:21:05 task@tools-exec-1423.tools.eqi 1
876537554 0.30061 pats-gadge tools.dibot r 05/29/2018 10:40:13 task@tools-exec-1413.tools.eqi 1
886538599 0.30060 inc_check tools.dibot r 05/29/2018 11:05:11 task@tools-exec-1404.eqiad.wmf 1
896539215 0.30059 filemoves_ tools.dibot r 05/29/2018 11:21:05 task@tools-exec-1432.tools.eqi 1
906539813 0.30058 pats-gadge tools.dibot r 05/29/2018 11:40:13 task@tools-exec-1417.tools.eqi 1
916540916 0.30056 inc_mritog tools.dibot r 05/29/2018 12:05:12 task@tools-exec-1420.tools.eqi 1
926540925 0.30056 inc_check tools.dibot r 05/29/2018 12:05:13 task@tools-exec-1429.tools.eqi 1
936541540 0.30055 filemoves_ tools.dibot r 05/29/2018 12:21:05 task@tools-exec-1406.eqiad.wmf 1
946542143 0.30054 pats-gadge tools.dibot r 05/29/2018 12:40:13 task@tools-exec-1441.tools.eqi 1
956543204 0.30053 inc_check tools.dibot r 05/29/2018 13:05:13 task@tools-exec-1415.tools.eqi 1
966543812 0.30052 filemoves_ tools.dibot r 05/29/2018 13:21:05 task@tools-exec-1428.tools.eqi 1
976544374 0.30051 pats-gadge tools.dibot r 05/29/2018 13:40:13 task@tools-exec-1416.tools.eqi 1
986545438 0.30050 inc_check tools.dibot r 05/29/2018 14:05:12 task@tools-exec-1439.tools.eqi 1
996545443 0.30050 inc_main tools.dibot r 05/29/2018 14:05:12 task@tools-exec-1442.tools.eqi 1
1006546048 0.30049 filemoves_ tools.dibot r 05/29/2018 14:21:03 task@tools-exec-1408.eqiad.wmf 1
1016546666 0.30048 pats-gadge tools.dibot r 05/29/2018 14:40:15 task@tools-exec-1428.tools.eqi 1
1026547683 0.30047 inc_check tools.dibot r 05/29/2018 15:05:11 task@tools-exec-1409.eqiad.wmf 1
1036548262 0.30046 filemoves_ tools.dibot r 05/29/2018 15:21:05 task@tools-exec-1412.tools.eqi 1
1046548817 0.30045 pats-gadge tools.dibot r 05/29/2018 15:40:14 task@tools-exec-1404.eqiad.wmf 1
1056549891 0.30043 inc_check tools.dibot r 05/29/2018 16:05:11 task@tools-exec-1406.eqiad.wmf 1
1066550498 0.30042 filemoves_ tools.dibot r 05/29/2018 16:21:06 task@tools-exec-1427.tools.eqi 1
1076551088 0.30041 pats-gadge tools.dibot r 05/29/2018 16:40:14 task@tools-exec-1430.tools.eqi 1
1086552121 0.30040 inc_check tools.dibot r 05/29/2018 17:05:11 task@tools-exec-1411.tools.eqi 1
1096552710 0.30039 filemoves_ tools.dibot r 05/29/2018 17:21:08 task@tools-exec-1416.tools.eqi 1
1106553311 0.30038 pats-gadge tools.dibot r 05/29/2018 17:40:14 task@tools-exec-1440.tools.eqi 1
1116554363 0.30037 inc_check tools.dibot r 05/29/2018 18:05:11 task@tools-exec-1422.tools.eqi 1
1126554365 0.30037 inc_mritog tools.dibot r 05/29/2018 18:05:11 task@tools-exec-1419.tools.eqi 1
1136554974 0.30036 filemoves_ tools.dibot r 05/29/2018 18:21:05 task@tools-exec-1434.tools.eqi 1
1146555556 0.30035 pats-gadge tools.dibot r 05/29/2018 18:40:14 task@tools-exec-1407.eqiad.wmf 1
1156556620 0.30033 inc_check tools.dibot r 05/29/2018 19:05:11 task@tools-exec-1414.tools.eqi 1
1166557203 0.30033 filemoves_ tools.dibot r 05/29/2018 19:21:07 task@tools-exec-1441.tools.eqi 1
1176557773 0.30032 pats-gadge tools.dibot r 05/29/2018 19:40:13 task@tools-exec-1436.tools.eqi 1
1186558830 0.30030 inc_check tools.dibot r 05/29/2018 20:05:13 task@tools-exec-1426.tools.eqi 1
1196559449 0.30029 filemoves_ tools.dibot r 05/29/2018 20:21:05 task@tools-exec-1428.tools.eqi 1
1206560032 0.30028 pats-gadge tools.dibot r 05/29/2018 20:40:18 task@tools-exec-1433.tools.eqi 1
1216561115 0.30027 inc_check tools.dibot r 05/29/2018 21:05:11 task@tools-exec-1405.eqiad.wmf 1
1226561716 0.30026 filemoves_ tools.dibot r 05/29/2018 21:21:04 task@tools-exec-1435.tools.eqi 1
1236562283 0.30025 pats-gadge tools.dibot r 05/29/2018 21:40:14 task@tools-exec-1413.tools.eqi 1
1246563354 0.30024 inc_check tools.dibot r 05/29/2018 22:05:12 task@tools-exec-1435.tools.eqi 1
1256563948 0.30023 filemoves_ tools.dibot r 05/29/2018 22:21:05 task@tools-exec-1415.tools.eqi 1
1266564480 0.30022 pats-gadge tools.dibot r 05/29/2018 22:40:14 task@tools-exec-1418.tools.eqi 1
1276565510 0.30020 inc_check tools.dibot r 05/29/2018 23:05:43 task@tools-exec-1403.eqiad.wmf 1
1286566085 0.30019 filemoves_ tools.dibot r 05/29/2018 23:21:04 task@tools-exec-1411.tools.eqi 1
1296566664 0.30018 pats-gadge tools.dibot r 05/29/2018 23:40:12 task@tools-exec-1421.tools.eqi 1
1306567754 0.30017 inc_check tools.dibot r 05/30/2018 00:05:10 task@tools-exec-1421.tools.eqi 1
1316567755 0.30017 inc_mritog tools.dibot r 05/30/2018 00:05:10 task@tools-exec-1405.eqiad.wmf 1
1326568132 0.30017 inc_redire tools.dibot r 05/30/2018 00:15:16 task@tools-exec-1417.tools.eqi 1
1336568412 0.30016 filemoves_ tools.dibot r 05/30/2018 00:21:05 task@tools-exec-1431.tools.eqi 1
1346569020 0.30015 pats-gadge tools.dibot r 05/30/2018 00:40:13 task@tools-exec-1403.eqiad.wmf 1
1356569986 0.30014 statbot tools.dibot r 05/30/2018 01:01:22 task@tools-exec-1420.tools.eqi 1
1366570075 0.30014 inc_check tools.dibot r 05/30/2018 01:05:08 task@tools-exec-1436.tools.eqi 1
1376570694 0.30013 filemoves_ tools.dibot r 05/30/2018 01:21:04 task@tools-exec-1405.eqiad.wmf 1
1386571314 0.30012 pats-gadge tools.dibot r 05/30/2018 01:40:15 task@tools-exec-1411.tools.eqi 1
1396572291 0.30011 inc_check tools.dibot r 05/30/2018 02:05:11 task@tools-exec-1430.tools.eqi 1
1406572785 0.30010 inc_image tools.dibot r 05/30/2018 02:20:14 task@tools-exec-1412.tools.eqi 1
1416572885 0.30010 filemoves_ tools.dibot r 05/30/2018 02:21:04 task@tools-exec-1435.tools.eqi 1
1426573458 0.30009 pats-gadge tools.dibot r 05/30/2018 02:40:13 task@tools-exec-1410.eqiad.wmf 1
1436574545 0.30007 inc_main tools.dibot r 05/30/2018 03:05:09 task@tools-exec-1428.tools.eqi 1
1446574548 0.30007 inc_check tools.dibot r 05/30/2018 03:05:10 task@tools-exec-1433.tools.eqi 1
1456575170 0.30006 filemoves_ tools.dibot r 05/30/2018 03:21:04 task@tools-exec-1439.tools.eqi 1
1466575717 0.30005 pats-gadge tools.dibot r 05/30/2018 03:40:13 task@tools-exec-1421.tools.eqi 1
1476576754 0.30004 inc_check tools.dibot r 05/30/2018 04:05:11 task@tools-exec-1442.tools.eqi 1
1486577342 0.30003 filemoves_ tools.dibot r 05/30/2018 04:21:04 task@tools-exec-1413.tools.eqi 1
1496577906 0.30002 pats-gadge tools.dibot r 05/30/2018 04:40:13 task@tools-exec-1408.eqiad.wmf 1
1506578890 0.30001 inc_check tools.dibot r 05/30/2018 05:05:09 task@tools-exec-1441.tools.eqi 1
151tools.dibot@tools-bastion-05:~$ qstat | tail -n +3 | grep -v inc_remind | grep -v lighttpd-d | awk '{ print $1 }' | xargs -n 1 qdel
152tools.dibot has registered the job 6488557 for deletion
153tools.dibot has registered the job 6489562 for deletion
154tools.dibot has registered the job 6489867 for deletion
155tools.dibot has registered the job 6490154 for deletion
156tools.dibot has registered the job 6490713 for deletion
157tools.dibot has registered the job 6491759 for deletion
158tools.dibot has registered the job 6491760 for deletion
159tools.dibot has registered the job 6492339 for deletion
160tools.dibot has registered the job 6492881 for deletion
161tools.dibot has registered the job 6493938 for deletion
162tools.dibot has registered the job 6494514 for deletion
163tools.dibot has registered the job 6495065 for deletion
164tools.dibot has registered the job 6496094 for deletion
165tools.dibot has registered the job 6496710 for deletion
166tools.dibot has registered the job 6497262 for deletion
167tools.dibot has registered the job 6498283 for deletion
168tools.dibot has registered the job 6498871 for deletion
169tools.dibot has registered the job 6499422 for deletion
170tools.dibot has registered the job 6500447 for deletion
171tools.dibot has registered the job 6500457 for deletion
172tools.dibot has registered the job 6501041 for deletion
173tools.dibot has registered the job 6501602 for deletion
174tools.dibot has registered the job 6502610 for deletion
175tools.dibot has registered the job 6503199 for deletion
176tools.dibot has registered the job 6503751 for deletion
177tools.dibot has registered the job 6504780 for deletion
178tools.dibot has registered the job 6505360 for deletion
179tools.dibot has registered the job 6505923 for deletion
180tools.dibot has registered the job 6506950 for deletion
181tools.dibot has registered the job 6507529 for deletion
182tools.dibot has registered the job 6508080 for deletion
183tools.dibot has registered the job 6509106 for deletion
184tools.dibot has registered the job 6509690 for deletion
185tools.dibot has registered the job 6510226 for deletion
186tools.dibot has registered the job 6511251 for deletion
187tools.dibot has registered the job 6511827 for deletion
188tools.dibot has registered the job 6512363 for deletion
189tools.dibot has registered the job 6513535 for deletion
190tools.dibot has registered the job 6513536 for deletion
191tools.dibot has registered the job 6513957 for deletion
192tools.dibot has registered the job 6514237 for deletion
193tools.dibot has registered the job 6514832 for deletion
194tools.dibot has registered the job 6515837 for deletion
195tools.dibot has registered the job 6515921 for deletion
196tools.dibot has registered the job 6516563 for deletion
197tools.dibot has registered the job 6517167 for deletion
198tools.dibot has registered the job 6518219 for deletion
199tools.dibot has registered the job 6518713 for deletion
200tools.dibot has registered the job 6518826 for deletion
201tools.dibot has registered the job 6519426 for deletion
202tools.dibot has registered the job 6520526 for deletion
203tools.dibot has registered the job 6520530 for deletion
204tools.dibot has registered the job 6521145 for deletion
205tools.dibot has registered the job 6521710 for deletion
206tools.dibot has registered the job 6522792 for deletion
207tools.dibot has registered the job 6523406 for deletion
208tools.dibot has registered the job 6523968 for deletion
209tools.dibot has registered the job 6525639 for deletion
210tools.dibot has registered the job 6526223 for deletion
211tools.dibot has registered the job 6527283 for deletion
212tools.dibot has registered the job 6527284 for deletion
213tools.dibot has registered the job 6527889 for deletion
214tools.dibot has registered the job 6528454 for deletion
215tools.dibot has registered the job 6529540 for deletion
216tools.dibot has registered the job 6530136 for deletion
217tools.dibot has registered the job 6530704 for deletion
218tools.dibot has registered the job 6531753 for deletion
219tools.dibot has registered the job 6532391 for deletion
220tools.dibot has registered the job 6532989 for deletion
221tools.dibot has registered the job 6534046 for deletion
222tools.dibot has registered the job 6534686 for deletion
223tools.dibot has registered the job 6535277 for deletion
224tools.dibot has registered the job 6536356 for deletion
225tools.dibot has registered the job 6536991 for deletion
226tools.dibot has registered the job 6537554 for deletion
227tools.dibot has registered the job 6538599 for deletion
228tools.dibot has registered the job 6539215 for deletion
229tools.dibot has registered the job 6539813 for deletion
230tools.dibot has registered the job 6540916 for deletion
231tools.dibot has registered the job 6540925 for deletion
232tools.dibot has registered the job 6541540 for deletion
233tools.dibot has registered the job 6542143 for deletion
234tools.dibot has registered the job 6543204 for deletion
235tools.dibot has registered the job 6543812 for deletion
236tools.dibot has registered the job 6544374 for deletion
237tools.dibot has registered the job 6545438 for deletion
238tools.dibot has registered the job 6545443 for deletion
239tools.dibot has registered the job 6546048 for deletion
240tools.dibot has registered the job 6546666 for deletion
241tools.dibot has registered the job 6547683 for deletion
242tools.dibot has registered the job 6548262 for deletion
243tools.dibot has registered the job 6548817 for deletion
244tools.dibot has registered the job 6549891 for deletion
245tools.dibot has registered the job 6550498 for deletion
246tools.dibot has registered the job 6551088 for deletion
247tools.dibot has registered the job 6552121 for deletion
248tools.dibot has registered the job 6552710 for deletion
249tools.dibot has registered the job 6553311 for deletion
250tools.dibot has registered the job 6554363 for deletion
251tools.dibot has registered the job 6554365 for deletion
252tools.dibot has registered the job 6554974 for deletion
253tools.dibot has registered the job 6555556 for deletion
254tools.dibot has registered the job 6556620 for deletion
255tools.dibot has registered the job 6557203 for deletion
256tools.dibot has registered the job 6557773 for deletion
257tools.dibot has registered the job 6558830 for deletion
258tools.dibot has registered the job 6559449 for deletion
259tools.dibot has registered the job 6560032 for deletion
260tools.dibot has registered the job 6561115 for deletion
261tools.dibot has registered the job 6561716 for deletion
262tools.dibot has registered the job 6562283 for deletion
263tools.dibot has registered the job 6563354 for deletion
264tools.dibot has registered the job 6563948 for deletion
265tools.dibot has registered the job 6564480 for deletion
266tools.dibot has registered the job 6565510 for deletion
267tools.dibot has registered the job 6566085 for deletion
268tools.dibot has registered the job 6566664 for deletion
269tools.dibot has registered the job 6567754 for deletion
270tools.dibot has registered the job 6567755 for deletion
271tools.dibot has registered the job 6568132 for deletion
272tools.dibot has registered the job 6568412 for deletion
273tools.dibot has registered the job 6569020 for deletion
274tools.dibot has registered the job 6569986 for deletion
275tools.dibot has registered the job 6570075 for deletion
276tools.dibot has registered the job 6570694 for deletion
277tools.dibot has registered the job 6571314 for deletion
278tools.dibot has registered the job 6572291 for deletion
279tools.dibot has registered the job 6572785 for deletion
280tools.dibot has registered the job 6572885 for deletion
281tools.dibot has registered the job 6573458 for deletion
282tools.dibot has registered the job 6574545 for deletion
283tools.dibot has registered the job 6574548 for deletion
284tools.dibot has registered the job 6575170 for deletion
285tools.dibot has registered the job 6575717 for deletion
286tools.dibot has registered the job 6576754 for deletion
287tools.dibot has registered the job 6577342 for deletion
288tools.dibot has registered the job 6577906 for deletion
289tools.dibot has registered the job 6578890 for deletion

Mentioned in SAL (#wikimedia-cloud) [2018-05-30T10:45:10Z] <zhuyifei1999_> installing mono-runtime-dbg on tools-bastion-05 to produce debugging information; was previously installed on tools-exec-1413 & 1441. Might be a good idea to uninstall them once we can close T195834

aborrero triaged this task as Medium priority.May 31 2018, 11:22 AM

Mentioned in SAL (#wikimedia-cloud) [2018-06-03T10:19:24Z] <zhuyifei1999_> Grid is full. qdel'ed all jobs belonging to tools.dibot except lighttpd, and tools.mbh that has a job name starting 'comm_delin', 'delfilexcl' T195834

graphite-labs.wikimedia.org_2.png (250×800 px, 37 KB)

graphite-labs.wikimedia.org.png (250×800 px, 37 KB)

(1 2)
122:07:50 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time
222:10:01 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.003 second response time
300:20:30 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time
400:43:31 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.008 second response time
501:55:40 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.004 second response time
601:30:40 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time
701:42:01 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time
801:55:51 <icinga-wm> PROBLEM - toolschecker: Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 356 bytes in 60.014 second response time
902:32:40 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time
1002:44:50 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.016 second response time
1103:06:30 <icinga-wm> RECOVERY - toolschecker: Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.373 second response time
1204:07:51 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.012 second response time
1304:20:01 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time
1406:52:04 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time
1507:04:05 <icinga-wm> PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.006 second response time
1610:13:11 <icinga-wm> RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.010 second response time

CC @Paladox

1root@tools-bastion-05:~# qstat -u tools.dibot
2job-ID prior name user state submit/start at queue slots ja-task-ID
3-----------------------------------------------------------------------------------------------------------------
4 207071 0.40246 inc_remind tools.dibot r 01/23/2018 03:25:09 task@tools-exec-1415.tools.eqi 1
5 248025 0.40168 inc_remind tools.dibot r 01/24/2018 03:25:06 task@tools-exec-1420.tools.eqi 1
6 287975 0.40089 inc_remind tools.dibot r 01/25/2018 03:25:06 task@tools-exec-1426.tools.eqi 1
7 329094 0.40011 inc_remind tools.dibot r 01/26/2018 03:25:06 task@tools-exec-1436.tools.eqi 1
8 369230 0.39933 inc_remind tools.dibot r 01/27/2018 03:25:10 task@tools-exec-1429.tools.eqi 1
9 410400 0.39855 inc_remind tools.dibot r 01/28/2018 03:25:05 task@tools-exec-1402.eqiad.wmf 1
10 450866 0.39777 inc_remind tools.dibot r 01/29/2018 03:25:02 task@tools-exec-1412.tools.eqi 1
112518218 0.36159 lighttpd-d tools.dibot r 03/16/2018 12:15:06 webgrid-lighttpd@tools-webgrid 1
126521221 0.30412 inc_remind tools.dibot r 05/29/2018 03:25:07 task@tools-exec-1408.eqiad.wmf 1
136627889 0.30256 inc_remind tools.dibot r 05/31/2018 03:25:08 task@tools-exec-1405.eqiad.wmf 1
146659313 0.30211 filemoves_ tools.dibot r 05/31/2018 17:21:07 task@tools-exec-1418.tools.eqi 1
156659872 0.30210 pats-gadge tools.dibot r 05/31/2018 17:40:13 task@tools-exec-1412.tools.eqi 1
166660935 0.30208 inc_mritog tools.dibot r 05/31/2018 18:05:12 task@tools-exec-1435.tools.eqi 1
176660937 0.30208 inc_check tools.dibot r 05/31/2018 18:05:13 task@tools-exec-1406.eqiad.wmf 1
186661533 0.30207 filemoves_ tools.dibot r 05/31/2018 18:21:04 task@tools-exec-1415.tools.eqi 1
196662112 0.30206 pats-gadge tools.dibot r 05/31/2018 18:40:15 task@tools-exec-1407.eqiad.wmf 1
206663157 0.30205 inc_check tools.dibot r 05/31/2018 19:05:12 task@tools-exec-1421.tools.eqi 1
216663756 0.30204 filemoves_ tools.dibot r 05/31/2018 19:21:06 task@tools-exec-1412.tools.eqi 1
226664366 0.30203 pats-gadge tools.dibot r 05/31/2018 19:40:18 task@tools-exec-1434.tools.eqi 1
236665385 0.30202 inc_check tools.dibot r 05/31/2018 20:05:13 task@tools-exec-1439.tools.eqi 1
246666007 0.30201 filemoves_ tools.dibot r 05/31/2018 20:21:06 task@tools-exec-1406.eqiad.wmf 1
256666568 0.30200 pats-gadge tools.dibot r 05/31/2018 20:40:13 task@tools-exec-1405.eqiad.wmf 1
266667620 0.30199 inc_check tools.dibot r 05/31/2018 21:05:13 task@tools-exec-1406.eqiad.wmf 1
276668190 0.30198 filemoves_ tools.dibot r 05/31/2018 21:21:54 task@tools-exec-1404.eqiad.wmf 1
286668719 0.30197 pats-gadge tools.dibot r 05/31/2018 21:40:13 task@tools-exec-1427.tools.eqi 1
296669696 0.30195 inc_check tools.dibot r 05/31/2018 22:05:14 task@tools-exec-1403.eqiad.wmf 1
306670261 0.30194 filemoves_ tools.dibot r 05/31/2018 22:21:05 task@tools-exec-1407.eqiad.wmf 1
316670782 0.30193 pats-gadge tools.dibot r 05/31/2018 22:40:12 task@tools-exec-1423.tools.eqi 1
326671767 0.30192 inc_check tools.dibot r 05/31/2018 23:05:11 task@tools-exec-1432.tools.eqi 1
336672330 0.30191 filemoves_ tools.dibot r 05/31/2018 23:21:05 task@tools-exec-1415.tools.eqi 1
346672887 0.30190 pats-gadge tools.dibot r 05/31/2018 23:40:15 task@tools-exec-1420.tools.eqi 1
356674007 0.30189 inc_mritog tools.dibot r 06/01/2018 00:05:09 task@tools-exec-1431.tools.eqi 1
366674008 0.30189 inc_check tools.dibot r 06/01/2018 00:05:09 task@tools-exec-1433.tools.eqi 1
376674393 0.30188 inc_redire tools.dibot r 06/01/2018 00:15:17 task@tools-exec-1425.tools.eqi 1
386674676 0.30188 filemoves_ tools.dibot r 06/01/2018 00:21:04 task@tools-exec-1433.tools.eqi 1
396675257 0.30187 pats-gadge tools.dibot r 06/01/2018 00:40:13 task@tools-exec-1405.eqiad.wmf 1
406676201 0.30186 statbot tools.dibot r 06/01/2018 01:01:07 task@tools-exec-1406.eqiad.wmf 1
416676285 0.30186 nullbot tools.dibot r 06/01/2018 01:05:09 task@tools-exec-1404.eqiad.wmf 1
426676286 0.30186 inc_check tools.dibot r 06/01/2018 01:05:09 task@tools-exec-1421.tools.eqi 1
436676881 0.30185 filemoves_ tools.dibot r 06/01/2018 01:21:05 task@tools-exec-1422.tools.eqi 1
446677452 0.30184 pats-gadge tools.dibot r 06/01/2018 01:40:15 task@tools-exec-1426.tools.eqi 1
456678432 0.30182 inc_check tools.dibot r 06/01/2018 02:05:12 task@tools-exec-1417.tools.eqi 1
466678897 0.30181 inc_image tools.dibot r 06/01/2018 02:20:14 task@tools-exec-1422.tools.eqi 1
476678993 0.30181 filemoves_ tools.dibot r 06/01/2018 02:21:04 task@tools-exec-1403.eqiad.wmf 1
486679594 0.30180 pats-gadge tools.dibot r 06/01/2018 02:40:17 task@tools-exec-1424.tools.eqi 1
496680554 0.30179 inc_main tools.dibot r 06/01/2018 03:05:11 task@tools-exec-1404.eqiad.wmf 1
506680581 0.30179 inc_check tools.dibot r 06/01/2018 03:05:12 task@tools-exec-1410.eqiad.wmf 1
516681145 0.30178 filemoves_ tools.dibot r 06/01/2018 03:21:04 task@tools-exec-1432.tools.eqi 1
526681206 0.30178 inc_remind tools.dibot r 06/01/2018 03:25:07 task@tools-exec-1428.tools.eqi 1
536681651 0.30177 pats-gadge tools.dibot r 06/01/2018 03:40:13 task@tools-exec-1418.tools.eqi 1
546682667 0.30176 inc_check tools.dibot r 06/01/2018 04:05:11 task@tools-exec-1416.tools.eqi 1
556683217 0.30175 filemoves_ tools.dibot r 06/01/2018 04:21:03 task@tools-exec-1418.tools.eqi 1
566683751 0.30174 pats-gadge tools.dibot r 06/01/2018 04:40:14 task@tools-exec-1411.tools.eqi 1
576684765 0.30173 inc_check tools.dibot r 06/01/2018 05:05:11 task@tools-exec-1429.tools.eqi 1
586685332 0.30172 filemoves_ tools.dibot r 06/01/2018 05:21:05 task@tools-exec-1409.eqiad.wmf 1
596685870 0.30171 pats-gadge tools.dibot r 06/01/2018 05:40:14 task@tools-exec-1441.tools.eqi 1
606686878 0.30169 inc_mritog tools.dibot r 06/01/2018 06:05:09 task@tools-exec-1431.tools.eqi 1
616686893 0.30169 inc_check tools.dibot r 06/01/2018 06:05:10 task@tools-exec-1402.eqiad.wmf 1
626687477 0.30168 filemoves_ tools.dibot r 06/01/2018 06:21:05 task@tools-exec-1419.tools.eqi 1
636688024 0.30167 pats-gadge tools.dibot r 06/01/2018 06:40:14 task@tools-exec-1441.tools.eqi 1
646689029 0.30166 inc_check tools.dibot r 06/01/2018 07:05:11 task@tools-exec-1415.tools.eqi 1
656689620 0.30165 filemoves_ tools.dibot r 06/01/2018 07:21:05 task@tools-exec-1426.tools.eqi 1
666690150 0.30164 pats-gadge tools.dibot r 06/01/2018 07:40:12 task@tools-exec-1431.tools.eqi 1
676691173 0.30163 inc_check tools.dibot r 06/01/2018 08:05:17 task@tools-exec-1441.tools.eqi 1
686691752 0.30162 filemoves_ tools.dibot r 06/01/2018 08:21:05 task@tools-exec-1434.tools.eqi 1
696692298 0.30161 pats-gadge tools.dibot r 06/01/2018 08:40:13 task@tools-exec-1425.tools.eqi 1
706693311 0.30160 inc_check tools.dibot r 06/01/2018 09:05:13 task@tools-exec-1411.tools.eqi 1
716693891 0.30159 filemoves_ tools.dibot r 06/01/2018 09:21:06 task@tools-exec-1403.eqiad.wmf 1
726694435 0.30158 pats-gadge tools.dibot r 06/01/2018 09:40:12 task@tools-exec-1416.tools.eqi 1
736695446 0.30156 inc_check tools.dibot r 06/01/2018 10:05:12 task@tools-exec-1405.eqiad.wmf 1
746696009 0.30155 filemoves_ tools.dibot r 06/01/2018 10:21:06 task@tools-exec-1429.tools.eqi 1
756696527 0.30154 pats-gadge tools.dibot r 06/01/2018 10:40:13 task@tools-exec-1436.tools.eqi 1
766697493 0.30153 inc_check tools.dibot r 06/01/2018 11:05:11 task@tools-exec-1418.tools.eqi 1
776698062 0.30152 filemoves_ tools.dibot r 06/01/2018 11:21:04 task@tools-exec-1430.tools.eqi 1
786698573 0.30151 pats-gadge tools.dibot r 06/01/2018 11:40:13 task@tools-exec-1402.eqiad.wmf 1
796699622 0.30150 inc_check tools.dibot r 06/01/2018 12:05:11 task@tools-exec-1427.tools.eqi 1
806699623 0.30150 inc_mritog tools.dibot r 06/01/2018 12:05:11 task@tools-exec-1412.tools.eqi 1
816700194 0.30149 filemoves_ tools.dibot r 06/01/2018 12:21:05 task@tools-exec-1421.tools.eqi 1
826700737 0.30148 pats-gadge tools.dibot r 06/01/2018 12:40:12 task@tools-exec-1410.eqiad.wmf 1
836701746 0.30146 inc_check tools.dibot r 06/01/2018 13:05:12 task@tools-exec-1433.tools.eqi 1
846702050 0.30146 inc_redire tools.dibot r 06/01/2018 13:15:15 task@tools-exec-1436.tools.eqi 1
856702297 0.30146 filemoves_ tools.dibot r 06/01/2018 13:21:04 task@tools-exec-1435.tools.eqi 1
866702830 0.30145 pats-gadge tools.dibot r 06/01/2018 13:40:13 task@tools-exec-1438.tools.eqi 1
876703824 0.30143 inc_check tools.dibot r 06/01/2018 14:05:12 task@tools-exec-1426.tools.eqi 1
886703827 0.30143 inc_main tools.dibot r 06/01/2018 14:05:12 task@tools-exec-1432.tools.eqi 1
896704382 0.30142 filemoves_ tools.dibot r 06/01/2018 14:21:04 task@tools-exec-1423.tools.eqi 1
906704925 0.30141 pats-gadge tools.dibot r 06/01/2018 14:40:15 task@tools-exec-1441.tools.eqi 1
916705912 0.30140 inc_check tools.dibot r 06/01/2018 15:05:11 task@tools-exec-1441.tools.eqi 1
926706465 0.30139 filemoves_ tools.dibot r 06/01/2018 15:21:05 task@tools-exec-1415.tools.eqi 1
936706983 0.30138 pats-gadge tools.dibot r 06/01/2018 15:40:12 task@tools-exec-1407.eqiad.wmf 1
946707995 0.30137 inc_check tools.dibot r 06/01/2018 16:05:11 task@tools-exec-1411.tools.eqi 1
956708550 0.30136 filemoves_ tools.dibot r 06/01/2018 16:21:03 task@tools-exec-1423.tools.eqi 1
966709085 0.30135 pats-gadge tools.dibot r 06/01/2018 16:40:13 task@tools-exec-1419.tools.eqi 1
976710073 0.30133 inc_check tools.dibot r 06/01/2018 17:05:11 task@tools-exec-1435.tools.eqi 1
986710638 0.30133 filemoves_ tools.dibot r 06/01/2018 17:21:03 task@tools-exec-1439.tools.eqi 1
996711152 0.30132 pats-gadge tools.dibot r 06/01/2018 17:40:13 task@tools-exec-1440.tools.eqi 1
1006712155 0.30130 inc_mritog tools.dibot r 06/01/2018 18:05:12 task@tools-exec-1409.eqiad.wmf 1
1016712157 0.30130 inc_check tools.dibot r 06/01/2018 18:05:12 task@tools-exec-1419.tools.eqi 1
1026712736 0.30129 filemoves_ tools.dibot r 06/01/2018 18:21:05 task@tools-exec-1413.tools.eqi 1
1036713258 0.30128 pats-gadge tools.dibot r 06/01/2018 18:40:13 task@tools-exec-1424.tools.eqi 1
1046714247 0.30127 inc_check tools.dibot r 06/01/2018 19:05:11 task@tools-exec-1412.tools.eqi 1
1056714797 0.30126 filemoves_ tools.dibot r 06/01/2018 19:21:05 task@tools-exec-1439.tools.eqi 1
1066715329 0.30125 pats-gadge tools.dibot r 06/01/2018 19:40:13 task@tools-exec-1410.eqiad.wmf 1
1076716326 0.30124 inc_check tools.dibot r 06/01/2018 20:05:13 task@tools-exec-1427.tools.eqi 1
1086716877 0.30123 filemoves_ tools.dibot r 06/01/2018 20:21:04 task@tools-exec-1414.tools.eqi 1
1096717436 0.30122 pats-gadge tools.dibot r 06/01/2018 20:40:13 task@tools-exec-1404.eqiad.wmf 1
1106718419 0.30120 inc_check tools.dibot r 06/01/2018 21:05:11 task@tools-exec-1442.tools.eqi 1
1116718969 0.30120 filemoves_ tools.dibot r 06/01/2018 21:21:03 task@tools-exec-1435.tools.eqi 1
1126719487 0.30119 pats-gadge tools.dibot r 06/01/2018 21:40:13 task@tools-exec-1407.eqiad.wmf 1
1136720492 0.30117 inc_check tools.dibot r 06/01/2018 22:05:14 task@tools-exec-1415.tools.eqi 1
1146721046 0.30116 filemoves_ tools.dibot r 06/01/2018 22:21:04 task@tools-exec-1424.tools.eqi 1
1156721583 0.30115 pats-gadge tools.dibot r 06/01/2018 22:40:13 task@tools-exec-1427.tools.eqi 1
1166722548 0.30114 inc_check tools.dibot r 06/01/2018 23:05:11 task@tools-exec-1421.tools.eqi 1
1176723103 0.30113 filemoves_ tools.dibot r 06/01/2018 23:21:04 task@tools-exec-1421.tools.eqi 1
1186723607 0.30112 pats-gadge tools.dibot r 06/01/2018 23:40:13 task@tools-exec-1410.eqiad.wmf 1
1196724721 0.30111 inc_mritog tools.dibot r 06/02/2018 00:05:09 task@tools-exec-1412.tools.eqi 1
1206724722 0.30111 inc_check tools.dibot r 06/02/2018 00:05:09 task@tools-exec-1432.tools.eqi 1
1216725355 0.30110 filemoves_ tools.dibot r 06/02/2018 00:21:05 task@tools-exec-1417.tools.eqi 1
1226725894 0.30109 pats-gadge tools.dibot r 06/02/2018 00:40:13 task@tools-exec-1406.eqiad.wmf 1
1236726848 0.30108 statbot tools.dibot r 06/02/2018 01:01:13 task@tools-exec-1436.tools.eqi 1
1246726932 0.30107 inc_check tools.dibot r 06/02/2018 01:05:09 task@tools-exec-1422.tools.eqi 1
1256727522 0.30107 filemoves_ tools.dibot r 06/02/2018 01:21:03 task@tools-exec-1422.tools.eqi 1
1266728042 0.30106 pats-gadge tools.dibot r 06/02/2018 01:40:14 task@tools-exec-1440.tools.eqi 1
1276729034 0.30104 inc_check tools.dibot r 06/02/2018 02:05:12 task@tools-exec-1438.tools.eqi 1
1286729482 0.30103 inc_image tools.dibot r 06/02/2018 02:20:13 task@tools-exec-1409.eqiad.wmf 1
1296729593 0.30103 filemoves_ tools.dibot r 06/02/2018 02:21:04 task@tools-exec-1429.tools.eqi 1
1306730114 0.30102 pats-gadge tools.dibot r 06/02/2018 02:40:13 task@tools-exec-1435.tools.eqi 1
1316731136 0.30101 inc_main tools.dibot r 06/02/2018 03:05:12 task@tools-exec-1430.tools.eqi 1
1326731137 0.30101 inc_check tools.dibot r 06/02/2018 03:05:12 task@tools-exec-1402.eqiad.wmf 1
1336731695 0.30100 filemoves_ tools.dibot r 06/02/2018 03:21:04 task@tools-exec-1409.eqiad.wmf 1
1346732208 0.30099 pats-gadge tools.dibot r 06/02/2018 03:40:13 task@tools-exec-1420.tools.eqi 1
1356733210 0.30098 inc_check tools.dibot r 06/02/2018 04:05:12 task@tools-exec-1410.eqiad.wmf 1
1366733764 0.30097 filemoves_ tools.dibot r 06/02/2018 04:21:05 task@tools-exec-1404.eqiad.wmf 1
1376734280 0.30096 pats-gadge tools.dibot r 06/02/2018 04:40:14 task@tools-exec-1403.eqiad.wmf 1
1386735261 0.30094 inc_check tools.dibot r 06/02/2018 05:05:11 task@tools-exec-1412.tools.eqi 1
1396735820 0.30094 filemoves_ tools.dibot r 06/02/2018 05:21:05 task@tools-exec-1418.tools.eqi 1
1406736362 0.30093 pats-gadge tools.dibot r 06/02/2018 05:40:16 task@tools-exec-1411.tools.eqi 1
1416737330 0.30091 inc_check tools.dibot r 06/02/2018 06:05:12 task@tools-exec-1424.tools.eqi 1
1426737331 0.30091 inc_mritog tools.dibot r 06/02/2018 06:05:12 task@tools-exec-1424.tools.eqi 1
1436737877 0.30090 filemoves_ tools.dibot r 06/02/2018 06:21:05 task@tools-exec-1432.tools.eqi 1
1446738382 0.30089 pats-gadge tools.dibot r 06/02/2018 06:40:14 task@tools-exec-1434.tools.eqi 1
1456739377 0.30088 inc_check tools.dibot r 06/02/2018 07:05:12 task@tools-exec-1434.tools.eqi 1
1466739918 0.30087 filemoves_ tools.dibot r 06/02/2018 07:21:03 task@tools-exec-1433.tools.eqi 1
1476740412 0.30086 pats-gadge tools.dibot r 06/02/2018 07:40:45 task@tools-exec-1430.tools.eqi 1
1486741404 0.30085 inc_check tools.dibot r 06/02/2018 08:05:12 task@tools-exec-1408.eqiad.wmf 1
1496741952 0.30084 filemoves_ tools.dibot r 06/02/2018 08:21:05 task@tools-exec-1411.tools.eqi 1
1506742474 0.30083 pats-gadge tools.dibot r 06/02/2018 08:41:04 task@tools-exec-1414.tools.eqi 1
1516743425 0.30081 inc_check tools.dibot r 06/02/2018 09:05:11 task@tools-exec-1442.tools.eqi 1
1526743978 0.30081 filemoves_ tools.dibot r 06/02/2018 09:21:06 task@tools-exec-1409.eqiad.wmf 1
1536745475 0.30078 inc_check tools.dibot r 06/02/2018 10:05:12 task@tools-exec-1413.tools.eqi 1
1546746040 0.30077 filemoves_ tools.dibot r 06/02/2018 10:21:04 task@tools-exec-1417.tools.eqi 1
1556746560 0.30076 pats-gadge tools.dibot r 06/02/2018 10:40:21 task@tools-exec-1403.eqiad.wmf 1
1566747493 0.30075 inc_check tools.dibot r 06/02/2018 11:05:11 task@tools-exec-1418.tools.eqi 1
1576748037 0.30074 filemoves_ tools.dibot r 06/02/2018 11:21:08 task@tools-exec-1416.tools.eqi 1
1586748545 0.30073 pats-gadge tools.dibot r 06/02/2018 11:40:13 task@tools-exec-1410.eqiad.wmf 1
1596749582 0.30072 inc_check tools.dibot r 06/02/2018 12:06:18 task@tools-exec-1406.eqiad.wmf 1
1606750142 0.30071 filemoves_ tools.dibot r 06/02/2018 12:21:04 task@tools-exec-1417.tools.eqi 1
1616750705 0.30070 pats-gadge tools.dibot r 06/02/2018 12:40:16 task@tools-exec-1428.tools.eqi 1
1626752196 0.30068 filemoves_ tools.dibot r 06/02/2018 13:21:05 task@tools-exec-1423.tools.eqi 1
1636752741 0.30067 pats-gadge tools.dibot r 06/02/2018 13:40:16 task@tools-exec-1438.tools.eqi 1
1646753694 0.30065 inc_main tools.dibot r 06/02/2018 14:05:12 task@tools-exec-1408.eqiad.wmf 1
1656753703 0.30065 inc_check tools.dibot r 06/02/2018 14:05:12 task@tools-exec-1405.eqiad.wmf 1
1666754261 0.30064 filemoves_ tools.dibot r 06/02/2018 14:21:03 task@tools-exec-1429.tools.eqi 1
1676754764 0.30063 pats-gadge tools.dibot r 06/02/2018 14:40:14 task@tools-exec-1434.tools.eqi 1
1686755747 0.30062 inc_check tools.dibot r 06/02/2018 15:05:13 task@tools-exec-1429.tools.eqi 1
1696756289 0.30061 filemoves_ tools.dibot r 06/02/2018 15:21:04 task@tools-exec-1433.tools.eqi 1
1706756795 0.30060 pats-gadge tools.dibot r 06/02/2018 15:40:14 task@tools-exec-1407.eqiad.wmf 1
1716757750 0.30059 inc_check tools.dibot r 06/02/2018 16:05:20 task@tools-exec-1442.tools.eqi 1
1726758302 0.30058 filemoves_ tools.dibot r 06/02/2018 16:21:04 task@tools-exec-1404.eqiad.wmf 1
1736758811 0.30057 pats-gadge tools.dibot r 06/02/2018 16:40:14 task@tools-exec-1402.eqiad.wmf 1
1746759773 0.30055 inc_check tools.dibot r 06/02/2018 17:06:00 task@tools-exec-1406.eqiad.wmf 1
1756760311 0.30055 filemoves_ tools.dibot r 06/02/2018 17:21:04 task@tools-exec-1434.tools.eqi 1
1766760836 0.30054 pats-gadge tools.dibot r 06/02/2018 17:40:14 task@tools-exec-1411.tools.eqi 1
1776761811 0.30052 inc_mritog tools.dibot r 06/02/2018 18:07:16 task@tools-exec-1405.eqiad.wmf 1
1786761813 0.30052 inc_check tools.dibot r 06/02/2018 18:07:23 task@tools-exec-1409.eqiad.wmf 1
1796762355 0.30051 filemoves_ tools.dibot r 06/02/2018 18:21:17 task@tools-exec-1428.tools.eqi 1
1806762870 0.30050 pats-gadge tools.dibot r 06/02/2018 18:40:14 task@tools-exec-1401.eqiad.wmf 1
1816763813 0.30049 inc_check tools.dibot r 06/02/2018 19:11:50 task@tools-exec-1423.tools.eqi 1
1826764291 0.30048 filemoves_ tools.dibot r 06/02/2018 19:23:42 task@tools-exec-1419.tools.eqi 1
1836764786 0.30047 pats-gadge tools.dibot r 06/02/2018 19:40:13 task@tools-exec-1415.tools.eqi 1
1846765740 0.30046 inc_check tools.dibot r 06/02/2018 20:11:26 task@tools-exec-1435.tools.eqi 1
1856766215 0.30045 filemoves_ tools.dibot r 06/02/2018 20:23:25 task@tools-exec-1440.tools.eqi 1
1866766731 0.30044 pats-gadge tools.dibot r 06/02/2018 20:40:24 task@tools-exec-1436.tools.eqi 1
1876767672 0.30042 inc_check tools.dibot r 06/02/2018 21:12:09 task@tools-exec-1414.tools.eqi 1
1886768122 0.30042 filemoves_ tools.dibot r 06/02/2018 21:25:43 task@tools-exec-1420.tools.eqi 1
1896768618 0.30041 pats-gadge tools.dibot r 06/02/2018 21:40:14 task@tools-exec-1429.tools.eqi 1
1906769592 0.30039 inc_check tools.dibot r 06/02/2018 22:11:24 task@tools-exec-1438.tools.eqi 1
1916770067 0.30038 filemoves_ tools.dibot r 06/02/2018 22:22:26 task@tools-exec-1435.tools.eqi 1
1926770552 0.30037 pats-gadge tools.dibot r 06/02/2018 22:40:14 task@tools-exec-1425.tools.eqi 1
1936771512 0.30036 inc_check tools.dibot r 06/02/2018 23:09:13 task@tools-exec-1419.tools.eqi 1
1946772027 0.30035 filemoves_ tools.dibot r 06/02/2018 23:22:19 task@tools-exec-1414.tools.eqi 1
1956772541 0.30034 pats-gadge tools.dibot r 06/02/2018 23:40:15 task@tools-exec-1426.tools.eqi 1
1966773613 0.30033 inc_check tools.dibot r 06/03/2018 00:30:05 task@tools-exec-1422.tools.eqi 1
1976773614 0.30033 inc_mritog tools.dibot r 06/03/2018 00:30:06 task@tools-exec-1413.tools.eqi 1
1986773866 0.30032 inc_redire tools.dibot r 06/03/2018 00:55:31 task@tools-exec-1420.tools.eqi 1
1996774013 0.30032 filemoves_ tools.dibot r 06/03/2018 01:04:22 task@tools-exec-1432.tools.eqi 1
2006774348 0.30031 pats-gadge tools.dibot r 06/03/2018 01:23:51 task@tools-exec-1421.tools.eqi 1
2016774858 0.30030 statbot tools.dibot r 06/03/2018 01:55:46 task@tools-exec-1434.tools.eqi 1
2026774917 0.30029 nullbot tools.dibot r 06/03/2018 01:59:47 task@tools-exec-1428.tools.eqi 1
2036774918 0.30029 inc_check tools.dibot r 06/03/2018 01:59:47 task@tools-exec-1413.tools.eqi 1
2046775241 0.30029 filemoves_ tools.dibot r 06/03/2018 02:19:52 task@tools-exec-1442.tools.eqi 1
2056775566 0.30028 pats-gadge tools.dibot r 06/03/2018 02:44:50 task@tools-exec-1428.tools.eqi 1
2066776010 0.30026 inc_check tools.dibot r 06/03/2018 03:30:33 task@tools-exec-1430.tools.eqi 1
2076776219 0.30025 inc_image tools.dibot r 06/03/2018 03:47:28 task@tools-exec-1421.tools.eqi 1
2086776262 0.30025 filemoves_ tools.dibot r 06/03/2018 03:50:36 task@tools-exec-1407.eqiad.wmf 1
2096776485 0.30024 pats-gadge tools.dibot r 06/03/2018 04:08:51 task@tools-exec-1439.tools.eqi 1
2106776961 0.30023 inc_check tools.dibot r 06/03/2018 04:56:57 task@tools-exec-1407.eqiad.wmf 1
2116776963 0.30023 inc_main tools.dibot r 06/03/2018 04:57:04 task@tools-exec-1414.tools.eqi 1
2126777191 0.30022 filemoves_ tools.dibot r 06/03/2018 05:23:22 task@tools-exec-1426.tools.eqi 1
2136777224 0.30022 inc_remind tools.dibot r 06/03/2018 05:24:35 task@tools-exec-1403.eqiad.wmf 1
2146777430 0.30021 pats-gadge tools.dibot r 06/03/2018 05:43:15 task@tools-exec-1410.eqiad.wmf 1
2156777990 0.30020 inc_check tools.dibot r 06/03/2018 06:43:48 task@tools-exec-1430.tools.eqi 1
2166778261 0.30019 filemoves_ tools.dibot r 06/03/2018 07:11:35 task@tools-exec-1413.tools.eqi 1
2176778512 0.30018 pats-gadge tools.dibot r 06/03/2018 07:32:18 task@tools-exec-1436.tools.eqi 1
2186778926 0.30016 inc_check tools.dibot r 06/03/2018 08:21:32 task@tools-exec-1440.tools.eqi 1
2196779139 0.30016 filemoves_ tools.dibot r 06/03/2018 08:37:35 task@tools-exec-1441.tools.eqi 1
2206779341 0.30015 pats-gadge tools.dibot r 06/03/2018 09:02:45 task@tools-exec-1427.tools.eqi 1
2216779787 0.30013 inc_check tools.dibot qw 06/03/2018 06:05:10 1
2226779790 0.30013 inc_mritog tools.dibot qw 06/03/2018 06:05:11 1
2236780013 0.30012 filemoves_ tools.dibot qw 06/03/2018 06:21:53 1
2246780222 0.30011 pats-gadge tools.dibot qw 06/03/2018 06:40:12 1
2256780730 0.30010 inc_check tools.dibot qw 06/03/2018 07:05:11 1
2266780976 0.30009 filemoves_ tools.dibot qw 06/03/2018 07:21:06 1
2276781204 0.30008 pats-gadge tools.dibot qw 06/03/2018 07:40:13 1
2286781633 0.30007 inc_check tools.dibot qw 06/03/2018 08:05:12 1
2296781859 0.30006 filemoves_ tools.dibot qw 06/03/2018 08:21:04 1
2306782099 0.30005 pats-gadge tools.dibot qw 06/03/2018 08:40:13 1
2316782493 0.30003 inc_check tools.dibot qw 06/03/2018 09:05:43 1
2326782704 0.30003 filemoves_ tools.dibot qw 06/03/2018 09:21:04 1
2336782945 0.30002 pats-gadge tools.dibot qw 06/03/2018 09:40:13 1
2346783340 0.30000 inc_check tools.dibot qw 06/03/2018 10:05:10 1
2356783341 0.30000 nullbot tools.dibot qw 06/03/2018 10:05:10 1
236root@tools-bastion-05:~# qstat -u tools.mbh
237job-ID prior name user state submit/start at queue slots ja-task-ID
238-----------------------------------------------------------------------------------------------------------------
2392129287 0.36815 lighttpd-m tools.mbh r 03/08/2018 02:19:06 webgrid-lighttpd@tools-webgrid 1
2404877285 0.32686 nfnmns tools.mbh r 04/30/2018 00:00:37 task@tools-exec-1419.tools.eqi 1
2415644191 0.31392 comm_delin tools.mbh r 05/16/2018 14:00:18 task@tools-exec-1417.tools.eqi 1
2425646235 0.31389 comm_delin tools.mbh r 05/16/2018 15:00:19 task@tools-exec-1408.eqiad.wmf 1
2435646261 0.31389 delfilexcl tools.mbh r 05/16/2018 15:00:30 task@tools-exec-1430.tools.eqi 1
2445648132 0.31386 delfilexcl tools.mbh r 05/16/2018 16:00:26 task@tools-exec-1433.tools.eqi 1
2455648207 0.31386 comm_delin tools.mbh r 05/16/2018 16:00:36 task@tools-exec-1442.tools.eqi 1
2465650156 0.31382 delfilexcl tools.mbh r 05/16/2018 17:00:35 task@tools-exec-1408.eqiad.wmf 1
2475650157 0.31382 comm_delin tools.mbh r 05/16/2018 17:00:35 task@tools-exec-1420.tools.eqi 1
2485652081 0.31379 delfilexcl tools.mbh r 05/16/2018 18:00:37 task@tools-exec-1413.tools.eqi 1
2495654176 0.31376 comm_delin tools.mbh r 05/16/2018 19:00:28 task@tools-exec-1438.tools.eqi 1
2505654202 0.31376 delfilexcl tools.mbh r 05/16/2018 19:00:28 task@tools-exec-1436.tools.eqi 1
2515656265 0.31373 delfilexcl tools.mbh r 05/16/2018 20:00:40 task@tools-exec-1402.eqiad.wmf 1
2525656299 0.31373 comm_delin tools.mbh r 05/16/2018 20:00:42 task@tools-exec-1440.tools.eqi 1
2535658252 0.31369 retir_user tools.mbh r 05/16/2018 21:00:30 task@tools-exec-1432.tools.eqi 1
2545658268 0.31369 comm_delin tools.mbh r 05/16/2018 21:00:31 task@tools-exec-1414.tools.eqi 1
2555660284 0.31366 delfilexcl tools.mbh r 05/16/2018 22:00:32 task@tools-exec-1430.tools.eqi 1
2565660285 0.31366 comm_delin tools.mbh r 05/16/2018 22:00:32 task@tools-exec-1423.tools.eqi 1
2576513213 0.30423 catmoves tools.mbh r 05/29/2018 00:01:00 task@tools-exec-1424.tools.eqi 1
2586567466 0.30345 catmoves tools.mbh r 05/30/2018 00:01:04 task@tools-exec-1426.tools.eqi 1
2596671253 0.30193 rlbck tools.mbh r 05/31/2018 22:53:41 continuous@tools-exec-1441.too 1
2606780523 0.30010 comm_delin tools.mbh qw 06/03/2018 07:00:30 1
2616780554 0.30010 delfilexcl tools.mbh qw 06/03/2018 07:00:33 1
2626781486 0.30007 comm_delin tools.mbh qw 06/03/2018 08:00:32 1
2636781536 0.30007 delfilexcl tools.mbh qw 06/03/2018 08:00:36 1
2646782360 0.30004 delfilexcl tools.mbh qw 06/03/2018 09:00:37 1
2656782370 0.30004 comm_delin tools.mbh qw 06/03/2018 09:00:38 1
2666783244 0.30000 comm_delin tools.mbh qw 06/03/2018 10:00:39 1
2676783280 0.30000 delfilexcl tools.mbh qw 06/03/2018 10:00:42 1
268[...]
269root@tools-bastion-05:~# qstat -u tools.dibot
270job-ID prior name user state submit/start at queue slots ja-task-ID
271-----------------------------------------------------------------------------------------------------------------
2722518218 0.36160 lighttpd-d tools.dibot r 03/16/2018 12:15:06 webgrid-lighttpd@tools-webgrid 1
2736783640 0.30000 filemoves_ tools.dibot r 06/03/2018 10:21:54 task@tools-exec-1433.tools.eqi 1
274root@tools-bastion-05:~# qstat -u tools.mbh
275job-ID prior name user state submit/start at queue slots ja-task-ID
276-----------------------------------------------------------------------------------------------------------------
2772129287 0.36816 lighttpd-m tools.mbh r 03/08/2018 02:19:06 webgrid-lighttpd@tools-webgrid 1
2784877285 0.32688 nfnmns tools.mbh r 04/30/2018 00:00:37 task@tools-exec-1419.tools.eqi 1
2795658252 0.31371 retir_user tools.mbh r 05/16/2018 21:00:30 task@tools-exec-1432.tools.eqi 1
2806513213 0.30424 catmoves tools.mbh r 05/29/2018 00:01:00 task@tools-exec-1424.tools.eqi 1
2816567466 0.30346 catmoves tools.mbh r 05/30/2018 00:01:04 task@tools-exec-1426.tools.eqi 1
2826671253 0.30194 rlbck tools.mbh r 05/31/2018 22:53:41 continuous@tools-exec-1441.too 1

@Dmitry89 @MaxBioHazard Please fix your tools/bots. Add -once, increase -mem, autokill on infinte loop, or disable the entries in crontab. Filling the entire grid engine is unacceptable and I will have to do the last option if that remains the case. Of course I'd really prefer the tool/bot maintainers themselves 'fixing' it without sacrificing functionality.

All of my bots is still using my own version of mono, compiled into my personal folder, 4.8.0, not updated Toolforge version 5.12.0.

Is flag "-once" means that Toolforge will not try to re-run the task, if it was unsuccessfully run in first time?

I changed my crontab, now it is:

0 5 * * * jsub -N transnamespace_moves -mem 4G -quiet -once -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/trans.ns.moves.exe
50 23 * * * jsub -N orphan_files -mem 4G -once -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/orphane.files.exe
55 23 * * * jsub -N files_without_license -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/nolicensed_files.exe
0 5 * * * jsub -N transnamespace_redirects -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/illegal_redirects.exe
0 0 * * * jsub -N remove_curr_events -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/currevents_remove.exe
0 * * * * jsub -N exclude_deleted_files -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/exclude_deleted_files.exe
0 0 * * * jsub -N nonfree_files_in_nonmain_ns -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/nffiles_in_nmns.exe
0 0 1 * * jsub -N pats_awarding -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/pats_awarding.exe
0 * * * * jsub -N comm_delinker -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/mbh_delinker.exe
0 21 * * * jsub -N retired_users -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls -quiet /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/retired_counter.exe
0 0 * * * jsub -N flag_removing_arch -mem 4G -once -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/zsf_archiving.exe
0 0 1 * * jsub -N adminstats -once -mem 4G -quiet -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/adminstats.exe
0 0 * * * jsub -N category_moves -quiet -once -mem 4G -v MONO_TLS_PROVIDER=btls /data/project/mbh/mono48/bin/mono /data/project/mbh/bots/catmoves.exe

Is it enough?

In T195834#4252158, @MaxBioHazard wrote:

Is flag "-once" means that Toolforge will not try to re-run the task, if it was unsuccessfully run in first time?

The -once flag is for keeping duplicate jobs from starting via cron. It tells jsub compare the job name (-N ...) with the currently running job list for the tool. If there is a match it will log to the job's .err log file and not attempt to start another copy. This flag is added automatically if you use qcronsub or jstart rather than jsub to start your job.

So yes, it should keep your crontab from launching many copies of the same job. It won't fix whatever infinite loop/soft crash condition the job is experiencing that was keeping the duplicates from exiting. You will need to figure that out separately.

In T195834#4252158, @MaxBioHazard wrote:

Is it enough?

Looks good to me. Thanks. :) Let's hope upstream fixes the infinite loop ASAP. The bug affects 4.8.0 as well apparently.

Mentioned in SAL (#wikimedia-cloud) [2018-06-05T07:39:27Z] <zhuyifei1999_> qdel-ed all jobs except lighttpd of dibot and disabled crontab (sed 's/^/#/') T195834

Resolving this task now. Feel free to reopen if required.

20:15:37 <Nemo_bis> This host is too busy with mono bots anyway ;) Reminded me to check back. Using T195834#4241876, there are 110 such processes from dibot and one from mbh (@MaxBioHazard yours is tools.mbh /mnt/nfs/labstore-secondary-tools-project/mbh/mono48/bin/mono-sgen /data/project/mbh/bots/retired_counter.exe).

Looking at dibot's crontab, it now contains -once and -mem 4G, so I'm assuming the maintainer added the args after the re-enable, and forgot to kill the affected jobs. I'll kill them.

Mentioned in SAL (#wikimedia-cloud) [2018-06-26T12:33:30Z] <zhuyifei1999_> killed all jobs besides lighttpd T195834

Vvjjkkii renamed this task from mono-based bot hangs after mono version upgrade to h3baaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed aborrero as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: aborrero; removed: Aklapper.
Vgutierrez renamed this task from h3baaaaaaa to mono-based bot hangs after mono version upgrade.Jul 1 2018, 8:11 AM
Vgutierrez closed this task as Resolved.
Vgutierrez assigned this task to aborrero.
Vgutierrez lowered the priority of this task from High to Medium.
Vgutierrez updated the task description. (Show Details)
Vgutierrez added a subscriber: Aklapper.