1 | #################### job 9796961 |
---|---|
2 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|W|job 9796961.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:31:39 [600:25914]: exit_status of epilog = 1 |
3 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|W|rescheduling job 9796961.1 |
4 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796961's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud |
5 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796961.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26073]: exit_status of epilog = 1 |
6 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796961.1 |
7 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796961's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud |
8 | |
9 | #################### job 9796963 |
10 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796963.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26081]: exit_status of epilog = 1 |
11 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796963.1 |
12 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796963's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud |
13 | |
14 | #################### job 9796997 |
15 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796997.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26095]: exit_status of epilog = 1 |
16 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796997.1 |
17 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796997's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud |
18 | |
19 | #################### job 9797067 |
20 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797067.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:10 [600:26118]: exit_status of epilog = 1 |
21 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797067.1 |
22 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797067's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud |
23 | |
24 | #################### job 9796800 |
25 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:25 [600:9320]: exit_status of epilog = 1 |
26 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|W|rescheduling job 9796800.1 |
27 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs |
28 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:40 [600:26305]: exit_status of epilog = 1 |
29 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|rescheduling job 9796800.1 |
30 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
31 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9450]: exit_status of epilog = 1 |
32 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796800.1 |
33 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs |
34 | |
35 | #################### job 9796977 |
36 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796977.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9452]: exit_status of epilog = 1 |
37 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796977.1 |
38 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796977's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs |
39 | |
40 | #################### job 9797008 |
41 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797008.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9454]: exit_status of epilog = 1 |
42 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797008.1 |
43 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797008's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs |
44 | |
45 | #################### job 9797071 |
46 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797071.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9455]: exit_status of epilog = 1 |
47 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797071.1 |
48 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797071's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs |
49 | |
50 | #################### job 9796801 |
51 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|job 9796801.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:40 [600:26307]: exit_status of epilog = 1 |
52 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|rescheduling job 9796801.1 |
53 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796801's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
54 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796801.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26472]: exit_status of epilog = 1 |
55 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796801.1 |
56 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796801's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
57 | |
58 | #################### job 9796814 |
59 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796814.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26474]: exit_status of epilog = 1 |
60 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796814.1 |
61 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796814's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
62 | |
63 | #################### job 9797013 |
64 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9797013.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26477]: exit_status of epilog = 1 |
65 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9797013.1 |
66 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797013's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
67 | |
68 | #################### job 9797014 |
69 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9797014.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26480]: exit_status of epilog = 1 |
70 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9797014.1 |
71 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797014's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs |
72 | |
73 | #################### job 9796949 |
74 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1 |
75 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
76 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
77 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1 |
78 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
79 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
80 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1 |
81 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
82 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
83 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1 |
84 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
85 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
86 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1 |
87 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
88 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
89 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1 |
90 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
91 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
92 | |
93 | #################### job 9796962 |
94 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1 |
95 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
96 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
97 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1 |
98 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
99 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
100 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1 |
101 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
102 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
103 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1 |
104 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
105 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
106 | |
107 | #################### job 9796962 |
108 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1 |
109 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
110 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
111 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1 |
112 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
113 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
114 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1 |
115 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
116 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
117 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1 |
118 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
119 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
120 | |
121 | #################### job 9796803 |
122 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1 |
123 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
124 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
125 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1 |
126 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
127 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
128 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1 |
129 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
130 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
131 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1 |
132 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
133 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
134 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1 |
135 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
136 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
137 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1 |
138 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
139 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
140 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1 |
141 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
142 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
143 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1 |
144 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
145 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
146 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1 |
147 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
148 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
149 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1 |
150 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
151 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
152 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1 |
153 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
154 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
155 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1 |
156 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
157 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
158 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1 |
159 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
160 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
161 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1 |
162 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
163 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
164 | |
165 | #################### job 9796962 |
166 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1 |
167 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
168 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
169 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1 |
170 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
171 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
172 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1 |
173 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
174 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
175 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1 |
176 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
177 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
178 | |
179 | #################### job 9796978 |
180 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1 |
181 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
182 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
183 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1 |
184 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
185 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
186 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1 |
187 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
188 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
189 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1 |
190 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
191 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
192 | |
193 | #################### job 9796803 |
194 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1 |
195 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
196 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
197 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1 |
198 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
199 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
200 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1 |
201 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
202 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
203 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1 |
204 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
205 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
206 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1 |
207 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
208 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
209 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1 |
210 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
211 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
212 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1 |
213 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
214 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
215 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1 |
216 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
217 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
218 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1 |
219 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
220 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
221 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1 |
222 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
223 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
224 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1 |
225 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
226 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
227 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1 |
228 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
229 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
230 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1 |
231 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
232 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
233 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1 |
234 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
235 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
236 | |
237 | #################### job 9796949 |
238 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1 |
239 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
240 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
241 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1 |
242 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
243 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
244 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1 |
245 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
246 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
247 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1 |
248 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
249 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
250 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1 |
251 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
252 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
253 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1 |
254 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
255 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
256 | |
257 | #################### job 9796978 |
258 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1 |
259 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
260 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
261 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1 |
262 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
263 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
264 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1 |
265 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
266 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
267 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1 |
268 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
269 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
270 | |
271 | #################### job 9796949 |
272 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1 |
273 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
274 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
275 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1 |
276 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
277 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
278 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1 |
279 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
280 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
281 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1 |
282 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
283 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
284 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1 |
285 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
286 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
287 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1 |
288 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
289 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
290 | |
291 | #################### job 9796803 |
292 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1 |
293 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
294 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
295 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1 |
296 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
297 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
298 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1 |
299 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
300 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
301 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1 |
302 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
303 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
304 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1 |
305 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
306 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
307 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1 |
308 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
309 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
310 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1 |
311 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
312 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
313 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1 |
314 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
315 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
316 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1 |
317 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
318 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
319 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1 |
320 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
321 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
322 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1 |
323 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
324 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
325 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1 |
326 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
327 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
328 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1 |
329 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
330 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
331 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1 |
332 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
333 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
334 | |
335 | #################### job 9796949 |
336 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1 |
337 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
338 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
339 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1 |
340 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
341 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
342 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1 |
343 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
344 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
345 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1 |
346 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
347 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
348 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1 |
349 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
350 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
351 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1 |
352 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1 |
353 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
354 | |
355 | #################### job 9796962 |
356 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1 |
357 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
358 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
359 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1 |
360 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
361 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
362 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1 |
363 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
364 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
365 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1 |
366 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1 |
367 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
368 | |
369 | #################### job 9796978 |
370 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1 |
371 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
372 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
373 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1 |
374 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
375 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
376 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1 |
377 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
378 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
379 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1 |
380 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
381 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
382 | |
383 | #################### job 9796978 |
384 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1 |
385 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
386 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
387 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1 |
388 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
389 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
390 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1 |
391 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
392 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
393 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1 |
394 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1 |
395 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
396 | |
397 | #################### job 9796803 |
398 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1 |
399 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
400 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
401 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1 |
402 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
403 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
404 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1 |
405 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
406 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs |
407 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1 |
408 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
409 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs |
410 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1 |
411 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
412 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs |
413 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1 |
414 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
415 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs |
416 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1 |
417 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
418 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs |
419 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1 |
420 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
421 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
422 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1 |
423 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
424 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs |
425 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1 |
426 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
427 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
428 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1 |
429 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
430 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs |
431 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1 |
432 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
433 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs |
434 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1 |
435 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
436 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs |
437 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1 |
438 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1 |
439 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs |
440 | |
441 | #################### job 9791739 |
442 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:44:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
443 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:44:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
444 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:06|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-5.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
445 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:06|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
446 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
447 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
448 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
449 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
450 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
451 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
452 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
453 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
454 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
455 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
456 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
457 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
458 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
459 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
460 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
461 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
462 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
463 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
464 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
465 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
466 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
467 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
468 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
469 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1 |
470 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:51:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory |
471 | |
472 | #################### job 9792998 |
473 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9792998.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15032]: exit_status of epilog = 1 |
474 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9792998.1 |
475 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9792998's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud |
476 | |
477 | #################### job 9793030 |
478 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9793030.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15039]: exit_status of epilog = 1 |
479 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9793030.1 |
480 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793030's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud |
481 | |
482 | #################### job 9793086 |
483 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9793086.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15064]: exit_status of epilog = 1 |
484 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9793086.1 |
485 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793086's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud |
486 | |
487 | #################### job 9793113 |
488 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|W|job 9793113.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15084]: exit_status of epilog = 1 |
489 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|W|rescheduling job 9793113.1 |
490 | /data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793113's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud |
Description
Description
Details
Details
Customize query in gerrit
Event Timeline
Comment Actions
Initial investigation started because of a report by @-jem- that their jembot webservice was misbehaving.
[20:53] < bd808> lots of confusing state in this tool right now jem. $HOME/service.manifest is in a state that would indicate that the tool is down. But it is running. qstat shows the running version to be in 'dr' deleted state (but it is obviously running). The job that is in dr state was started on 2022-04-09 and is running on a Buster node at least.
[20:55] < bd808> !log tools.jembot Force deleted stuck webservice job [20:55] < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jembot/SAL [20:56] < bd808> ummm.... and yet it is still running? [21:00] < bd808> very confused right now about where the webservice is actually running. the grid has lost track, but the front proxy apparently has not.
[21:13] < bd808> this is so weird. usually if the front proxy gets confused like this it is possible to fix by starting and then stopping a grid webservice. The start should register a new backend ip/port and then stopping removes it again. I've done that twice now and the front proxy is still pretty obviously pointed at the "lost" job [21:13] bd808 will keep poking [21:14] jem supports mentally [21:33] < bd808> !log tools.jembot Found orphan grid job by fetching host and port from front proxy redis. Killed related processes on tools-sgeweblight-10-4
[21:41] < bd808> !log tools Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key [21:41] < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:48] < bd808> puppet is taking approximately forever to run on tools-sgebastion-11 :/ [21:55] < bd808> !log tools Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key [21:55] < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:56] < bd808> !log tools Removed legacy "active_proxy_host" hiera setting [21:56] < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:06] jem is back from dinner [22:09] < bd808> jem: I haven't given up, but things are still mysterious. I found and killed the rogue webservice. I have not yet been able to get a new webservice to start however. The jobs submits but gets stuck in queue wait state.
The "active_redis key" mentioned was a typo and should have been "active_proxy_host" (the one I deleted later).
Comment Actions
[22:24] bd808 figures out that tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud is sick somehow [22:26] < bd808> !log tools Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler
Not long after this I did the things from https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Clearing_error_state. When the queues flipped right back to E state I cried for help in -cloud-admin and got @nskaggs involved. Then I started this task.
Comment Actions
Noted dpkg was broken, and tools-sgeweblight-10-9 and tools-sgeweblight-10-10 were missing grid service, etc. Fixed dpkg, re-ran puppet to bring back online.
Comment Actions
Emails to root@tools.wmflabs.org show that most (all?) of the failed jobs are variations of execvlp(/bin/bash, "-bash" "-c" "/usr/bin/webservice-runner --register-proxy --type lighttpd ") failing which then fires the eplilog to cleanup the proxy registration and that fails as well. Watching on the front proxy with tail -f /var/log/proxylistener I think I see both the registration and the removal happening, but apparently the python processes back on the exec nodes are not seeing the expected acks for those events?
Comment Actions
Example log events on the front proxy:
$ grep cobain /var/log/proxylistener |tail -10 2022-06-03 00:08:26,308 Received request from cobain for .* to http://172.16.1.75:60743 2022-06-03 00:08:26,865 Cleaning up request from cobain for .* 2022-06-03 00:08:27,081 Received request from cobain for .* to http://172.16.1.75:33763 2022-06-03 00:08:27,449 Cleaning up request from cobain for .* 2022-06-03 00:08:27,752 Received request from cobain for .* to http://172.16.1.69:56789 2022-06-03 00:08:28,260 Cleaning up request from cobain for .* 2022-06-03 00:08:28,865 Received request from cobain for .* to http://172.16.1.69:34515 2022-06-03 00:08:29,349 Cleaning up request from cobain for .* 2022-06-03 00:08:29,637 Received request from cobain for .* to http://172.16.1.72:58491 2022-06-03 00:08:29,986 Cleaning up request from cobain for .*
Comment Actions
tools.bd808-test2@tools-sgewebgen-10-1:~$ /usr/bin/webservice-runner --register-proxy --type lighttpd Traceback (most recent call last): File "/usr/bin/webservice-runner", line 46, in <module> proxy.register(port) File "/usr/lib/python3/dist-packages/toolsws/proxy.py", line 114, in register register_dynamicproxy(port) File "/usr/lib/python3/dist-packages/toolsws/proxy.py", line 58, in register_dynamicproxy raise ProxyException("Port registration failed!") toolsws.proxy.ProxyException: Port registration failed!
But on tools-proxy-06:
2022-06-03 00:28:40,686 Received request from bd808-test2 for .* to http://172.16.7.61:38033
Is the ack not getting back to the caller?
Comment Actions
Change 802683 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/software/tools-webservice@master] proxy: horrible hack for T309821
Comment Actions
Change 802684 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/software/tools-webservice@master] d/changelog: Prepare for 0.85 release
Comment Actions
Change 802683 merged by jenkins-bot:
[operations/software/tools-webservice@master] proxy: horrible hack for T309821
Comment Actions
Change 802684 merged by jenkins-bot:
[operations/software/tools-webservice@master] d/changelog: Prepare for 0.85 release
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T03:10:37Z] <bd808> publish tools-webservice 0.85 with hack for T309821
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T03:21:45Z] <bd808> Cleared queue error states after deploying new toolforge-webservice package (T309821)
Comment Actions
Things are a little better after deploying the hacked version tools-webservice. Many more webservices are running, but certainly not all. Some nodes are still being overloaded too with jobs failing due to "fork failed: Cannot allocate memory" and similar errors.
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T05:05:21Z] <taavi> removing duplicate (there should be only one per tool) web service jobs from the grid T309821
Comment Actions
I suspect webservicemonitor somehow caused a lot of tools to have multiple instances running/queued. Removing those duplicates with a simple script:
import requests import subprocess import sys import time jobs = requests.get("https://sge-status.toolforge.org/api/v1").json()["data"]["attributes"] tools = [] for host, host_data in jobs.items(): if "web" not in host: continue if host_data["jobs"] == []: continue for job in host_data["jobs"].values(): if job["job_owner"] in tools: print("removing duplicate job for", job["job_owner"]) subprocess.check_call( [ "/usr/bin/qdel", str(job["num"]) ], stdout=sys.stdout, stderr=sys.stderr, ) time.sleep(1) continue tools.append(job["job_owner"])
Comment Actions
08:34:51 <taavi> so I think what happened is that something got stuck, and then webservicemonitor (now disabled) made the problem worse by spawning tons of duplicate jobs 08:35:37 <taavi> I'm trying to make it better by deleting those duplicates, but the grid isn't actually deleting jobs marked as to be deleted 08:36:10 <taavi> we might need to do a rolling reboot + force delete for the entire web grid
Comment Actions
I did a rolling reboot of all the Buster webgrid nodes, and deleted the jobs stuck in 'deleting' state. I /think/ the grid itself is now working properly again, individual web service might need restarts to start working again.
webservicemonitor is still disabled on tools-sgecron-01. Not yet sure what to do about it.
I believe the root cause is that some of the buster webgrid nodes got overloaded causing the sge_execd on that node to freeze. The grid for some reason stopped scheduling new jobs on the webgrid queues after that, and webservicemonitor made the problem worse since it though that the jobs in qw didn't exist and spawned new ones for web services it thought needed to be restarted.
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T12:46:50Z] <taavi> start webservicemonitor on tools-sgecron-01 T309821
Comment Actions
Change 802620 had a related patch set uploaded (by BryanDavis; author: BryanDavis):
[operations/software/tools-webservice@master] Revert "proxy: horrible hack for T309821"
Comment Actions
Change 802620 merged by jenkins-bot:
[operations/software/tools-webservice@master] Revert "proxy: horrible hack for T309821"
Comment Actions
Change 802773 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[operations/software/tools-webservice@master] d/changelog: Prepare for 0.86 release
Comment Actions
Change 802773 merged by jenkins-bot:
[operations/software/tools-webservice@master] d/changelog: Prepare for 0.86 release
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:17:52Z] <bd808> publish tools-webservice 0.86 (T309821)
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:20:33Z] <bd808> publish tools-webservice 0.86 (T309821)
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:25:14Z] <bd808> Upgrading fleet to tools-webservice 0.86 (T309821)
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T15:50:47Z] <balloons> temp add 1.0G swap to sgeweblight hosts T309821
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T15:50:53Z] <balloons> fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor T309821
Comment Actions
Upon further review, after the new buster hosts began acting up again, with OOM errors, even though free -m showed memory. After investigation, dcaro noted that there was a 24Mb swap partition on each. Adding temporary 1G swap space seems to have removed the errors.
The 24Mb swap was a typo from the more typical 24G swap found on other hosts. The flavor was updated to use 8G swap and 8G RAM. Nodes will be updated to utilize the new flavor.
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T18:59:59Z] <balloons> depooled old nodes, bringing entirely new grid of nodes online T309821
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2022-06-03T19:51:35Z] <balloons> Scaling webservice nodes to 20, using new 8G swap flavor T309821
Comment Actions
We have rebuilt 21 new lighttpd nodes, each with 8G of swap space (via a new flavor.)
Assuming this was an overprovisioning problem and not an invisible-memory-leak problem, things should be better. After a period of peace we can consider removing some of those nodes.
Comment Actions
I'm going to try and set expectations here and say this part of the incident is closed / resolved. Hopefully we don't have to re-open!