Page MenuHomePhabricator

Buster webservice grid went BOOM!
Closed, ResolvedPublic

Description

1#################### job 9796961
2/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|W|job 9796961.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:31:39 [600:25914]: exit_status of epilog = 1
3/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|W|rescheduling job 9796961.1
4/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:40|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796961's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud
5/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796961.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26073]: exit_status of epilog = 1
6/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796961.1
7/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796961's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud
8
9#################### job 9796963
10/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796963.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26081]: exit_status of epilog = 1
11/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796963.1
12/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796963's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud
13
14#################### job 9796997
15/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796997.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:09 [600:26095]: exit_status of epilog = 1
16/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796997.1
17/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796997's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud
18
19#################### job 9797067
20/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797067.1 failed on host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:10 [600:26118]: exit_status of epilog = 1
21/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797067.1
22/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797067's failure at host tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud
23
24#################### job 9796800
25/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:25 [600:9320]: exit_status of epilog = 1
26/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|W|rescheduling job 9796800.1
27/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:26|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs
28/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:40 [600:26305]: exit_status of epilog = 1
29/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|rescheduling job 9796800.1
30/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
31/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796800.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9450]: exit_status of epilog = 1
32/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796800.1
33/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796800's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs
34
35#################### job 9796977
36/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796977.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9452]: exit_status of epilog = 1
37/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796977.1
38/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796977's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs
39
40#################### job 9797008
41/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797008.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9454]: exit_status of epilog = 1
42/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797008.1
43/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797008's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs
44
45#################### job 9797071
46/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9797071.1 failed on host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:9455]: exit_status of epilog = 1
47/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9797071.1
48/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797071's failure at host tools-sgewebgrid-generic-0901.tools.eqiad.wmflabs
49
50#################### job 9796801
51/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|job 9796801.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:28:40 [600:26307]: exit_status of epilog = 1
52/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|W|rescheduling job 9796801.1
53/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:28:41|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796801's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
54/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796801.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26472]: exit_status of epilog = 1
55/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796801.1
56/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796801's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
57
58#################### job 9796814
59/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796814.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26474]: exit_status of epilog = 1
60/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796814.1
61/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9796814's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
62
63#################### job 9797013
64/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9797013.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26477]: exit_status of epilog = 1
65/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9797013.1
66/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797013's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
67
68#################### job 9797014
69/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9797014.1 failed on host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26480]: exit_status of epilog = 1
70/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9797014.1
71/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-generic marked QERROR as result of job 9797014's failure at host tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs
72
73#################### job 9796949
74/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1
75/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
76/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
77/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1
78/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
79/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
80/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1
81/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
82/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
83/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1
84/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
85/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
86/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1
87/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
88/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
89/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1
90/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
91/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
92
93#################### job 9796962
94/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1
95/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
96/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
97/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1
98/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
99/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
100/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1
101/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
102/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
103/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1
104/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
105/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
106
107#################### job 9796962
108/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1
109/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
110/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
111/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1
112/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
113/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
114/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1
115/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
116/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
117/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1
118/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
119/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
120
121#################### job 9796803
122/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1
123/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
124/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
125/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1
126/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
127/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
128/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1
129/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
130/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
131/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1
132/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
133/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
134/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1
135/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
136/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
137/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1
138/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
139/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
140/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1
141/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
142/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
143/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1
144/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
145/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
146/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1
147/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
148/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
149/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1
150/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
151/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
152/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1
153/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
154/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
155/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1
156/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
157/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
158/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1
159/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
160/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
161/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1
162/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
163/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
164
165#################### job 9796962
166/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1
167/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
168/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
169/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1
170/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
171/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
172/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1
173/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
174/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
175/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1
176/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
177/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
178
179#################### job 9796978
180/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1
181/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
182/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
183/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1
184/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
185/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
186/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1
187/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
188/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
189/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1
190/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
191/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
192
193#################### job 9796803
194/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1
195/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
196/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
197/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1
198/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
199/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
200/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1
201/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
202/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
203/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1
204/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
205/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
206/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1
207/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
208/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
209/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1
210/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
211/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
212/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1
213/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
214/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
215/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1
216/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
217/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
218/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1
219/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
220/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
221/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1
222/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
223/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
224/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1
225/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
226/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
227/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1
228/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
229/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
230/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1
231/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
232/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
233/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1
234/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
235/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
236
237#################### job 9796949
238/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1
239/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
240/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
241/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1
242/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
243/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
244/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1
245/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
246/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
247/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1
248/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
249/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
250/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1
251/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
252/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
253/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1
254/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
255/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
256
257#################### job 9796978
258/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1
259/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
260/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
261/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1
262/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
263/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
264/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1
265/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
266/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
267/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1
268/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
269/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
270
271#################### job 9796949
272/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1
273/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
274/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
275/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1
276/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
277/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
278/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1
279/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
280/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
281/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1
282/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
283/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
284/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1
285/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
286/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
287/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1
288/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
289/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
290
291#################### job 9796803
292/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1
293/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
294/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
295/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1
296/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
297/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
298/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1
299/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
300/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
301/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1
302/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
303/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
304/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1
305/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
306/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
307/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1
308/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
309/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
310/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1
311/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
312/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
313/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1
314/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
315/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
316/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1
317/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
318/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
319/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1
320/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
321/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
322/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1
323/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
324/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
325/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1
326/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
327/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
328/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1
329/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
330/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
331/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1
332/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
333/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
334
335#################### job 9796949
336/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13589]: exit_status of epilog = 1
337/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
338/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
339/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30920]: exit_status of epilog = 1
340/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
341/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
342/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:26307]: exit_status of epilog = 1
343/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
344/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
345/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:8875]: exit_status of epilog = 1
346/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
347/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
348/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27234]: exit_status of epilog = 1
349/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
350/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
351/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796949.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:922]: exit_status of epilog = 1
352/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796949.1
353/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796949's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
354
355#################### job 9796962
356/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:12407]: exit_status of epilog = 1
357/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
358/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
359/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23171]: exit_status of epilog = 1
360/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
361/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
362/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:17404]: exit_status of epilog = 1
363/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
364/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
365/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796962.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:925]: exit_status of epilog = 1
366/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796962.1
367/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796962's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
368
369#################### job 9796978
370/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1
371/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
372/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
373/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1
374/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
375/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
376/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1
377/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
378/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
379/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1
380/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
381/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
382
383#################### job 9796978
384/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:30956]: exit_status of epilog = 1
385/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
386/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
387/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:22298]: exit_status of epilog = 1
388/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
389/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
390/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27237]: exit_status of epilog = 1
391/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
392/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:41|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
393/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|job 9796978.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:56 [600:927]: exit_status of epilog = 1
394/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|W|rescheduling job 9796978.1
395/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:57|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796978's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
396
397#################### job 9796803
398/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:10 [600:20725]: exit_status of epilog = 1
399/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
400/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
401/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:25 [600:8514]: exit_status of epilog = 1
402/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
403/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
404/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:40 [600:30834]: exit_status of epilog = 1
405/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
406/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs
407/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:29:55 [600:26190]: exit_status of epilog = 1
408/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
409/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:29:55|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs
410/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:10 [600:12247]: exit_status of epilog = 1
411/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
412/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs
413/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:25 [600:8423]: exit_status of epilog = 1
414/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
415/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs
416/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:40 [600:21518]: exit_status of epilog = 1
417/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
418/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
419/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:30:55 [600:25624]: exit_status of epilog = 1
420/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
421/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:30:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
422/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:10 [600:13587]: exit_status of epilog = 1
423/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
424/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:11|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs
425/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:31:25 [600:30918]: exit_status of epilog = 1
426/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
427/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:31:26|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
428/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:10 [600:8652]: exit_status of epilog = 1
429/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
430/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:10|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs
431/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:25 [600:23169]: exit_status of epilog = 1
432/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
433/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:25|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs
434/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:40 [600:27232]: exit_status of epilog = 1
435/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
436/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:40|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs
437/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|job 9796803.1 failed on host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs general in epilog because: 06/02/2022 22:36:55 [600:920]: exit_status of epilog = 1
438/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|W|rescheduling job 9796803.1
439/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:56|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9796803's failure at host tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
440
441#################### job 9791739
442/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:44:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
443/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:44:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
444/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:06|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-5.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
445/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:06|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
446/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
447/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
448/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
449/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:45:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
450/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
451/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
452/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
453/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:46:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
454/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
455/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
456/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
457/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:47:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
458/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
459/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
460/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
461/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:48:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
462/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
463/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
464/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
465/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:49:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
466/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
467/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:21|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
468/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:51|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
469/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:50:51|worker|tools-sgegrid-master|W|rescheduling job 9791739.1
470/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 19:51:21|worker|tools-sgegrid-master|W|job 9791739.1 failed on host tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud assumedly before job because: fork failed: Cannot allocate memory
471
472#################### job 9792998
473/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9792998.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15032]: exit_status of epilog = 1
474/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9792998.1
475/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9792998's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud
476
477#################### job 9793030
478/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9793030.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15039]: exit_status of epilog = 1
479/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9793030.1
480/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793030's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud
481
482#################### job 9793086
483/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|job 9793086.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15064]: exit_status of epilog = 1
484/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|W|rescheduling job 9793086.1
485/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:12|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793086's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud
486
487#################### job 9793113
488/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|W|job 9793113.1 failed on host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud general in epilog because: 06/02/2022 22:36:11 [600:15084]: exit_status of epilog = 1
489/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|W|rescheduling job 9793113.1
490/data/project/.system_sge/gridengine/spool/qmaster/messages:06/02/2022 22:36:13|worker|tools-sgegrid-master|E|queue webgrid-lighttpd marked QERROR as result of job 9793113's failure at host tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud

Event Timeline

Initial investigation started because of a report by @-jem- that their jembot webservice was misbehaving.

[20:53]  <    bd808> lots of confusing state in this tool right now jem. $HOME/service.manifest is in a state that would indicate that the tool is down. But it is running. qstat shows the running version to be in 'dr' deleted state (but it is obviously running). The job that is in dr state was started on 2022-04-09 and is running on a Buster node at least.
[20:55]  <    bd808> !log tools.jembot Force deleted stuck webservice job
[20:55]  < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jembot/SAL
[20:56]  <    bd808> ummm.... and yet it is still running?
[21:00]  <    bd808> very confused right now about where the webservice is actually running. the grid has lost track, but the front proxy apparently has not.
[21:13]  <    bd808> this is so weird. usually if the front proxy gets confused like this it is possible to fix by starting and then stopping a grid webservice. The start should register a new backend ip/port and then stopping removes it again. I've done that twice now and the front proxy is still pretty obviously pointed at the "lost" job
[21:13] bd808 will keep poking
[21:14] jem supports mentally
[21:33]  <    bd808> !log tools.jembot Found orphan grid job by fetching host and port from front proxy redis. Killed related processes on tools-sgeweblight-10-4
[21:41]  <    bd808> !log tools Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key
[21:41]  < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[21:48]  <    bd808> puppet is taking approximately forever to run on tools-sgebastion-11 :/
[21:55]  <    bd808> !log tools Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key
[21:55]  < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[21:56]  <    bd808> !log tools Removed legacy "active_proxy_host" hiera setting
[21:56]  < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[22:06] jem is back from dinner
[22:09]  <    bd808> jem: I haven't given up, but things are still mysterious. I found and killed the rogue webservice. I have not yet been able to get a new webservice to start however. The jobs submits but gets stuck in queue wait state.

The "active_redis key" mentioned was a typo and should have been "active_proxy_host" (the one I deleted later).

[22:24] bd808 figures out that tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud is sick somehow
[22:26]  <    bd808> !log tools Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler

Not long after this I did the things from https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Clearing_error_state. When the queues flipped right back to E state I cried for help in -cloud-admin and got @nskaggs involved. Then I started this task.

Noted dpkg was broken, and tools-sgeweblight-10-9 and tools-sgeweblight-10-10 were missing grid service, etc. Fixed dpkg, re-ran puppet to bring back online.

Emails to root@tools.wmflabs.org show that most (all?) of the failed jobs are variations of execvlp(/bin/bash, "-bash" "-c" "/usr/bin/webservice-runner --register-proxy --type lighttpd ") failing which then fires the eplilog to cleanup the proxy registration and that fails as well. Watching on the front proxy with tail -f /var/log/proxylistener I think I see both the registration and the removal happening, but apparently the python processes back on the exec nodes are not seeing the expected acks for those events?

Example log events on the front proxy:

$ grep cobain /var/log/proxylistener |tail -10
2022-06-03 00:08:26,308 Received request from cobain for .* to http://172.16.1.75:60743
2022-06-03 00:08:26,865 Cleaning up request from cobain for .*
2022-06-03 00:08:27,081 Received request from cobain for .* to http://172.16.1.75:33763
2022-06-03 00:08:27,449 Cleaning up request from cobain for .*
2022-06-03 00:08:27,752 Received request from cobain for .* to http://172.16.1.69:56789
2022-06-03 00:08:28,260 Cleaning up request from cobain for .*
2022-06-03 00:08:28,865 Received request from cobain for .* to http://172.16.1.69:34515
2022-06-03 00:08:29,349 Cleaning up request from cobain for .*
2022-06-03 00:08:29,637 Received request from cobain for .* to http://172.16.1.72:58491
2022-06-03 00:08:29,986 Cleaning up request from cobain for .*
tools.bd808-test2@tools-sgewebgen-10-1:~$ /usr/bin/webservice-runner --register-proxy --type lighttpd
Traceback (most recent call last):
  File "/usr/bin/webservice-runner", line 46, in <module>
    proxy.register(port)
  File "/usr/lib/python3/dist-packages/toolsws/proxy.py", line 114, in register
    register_dynamicproxy(port)
  File "/usr/lib/python3/dist-packages/toolsws/proxy.py", line 58, in register_dynamicproxy
    raise ProxyException("Port registration failed!")
toolsws.proxy.ProxyException: Port registration failed!

But on tools-proxy-06:

2022-06-03 00:28:40,686 Received request from bd808-test2 for .* to http://172.16.7.61:38033

Is the ack not getting back to the caller?

Change 802683 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/software/tools-webservice@master] proxy: horrible hack for T309821

https://gerrit.wikimedia.org/r/802683

Change 802684 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/software/tools-webservice@master] d/changelog: Prepare for 0.85 release

https://gerrit.wikimedia.org/r/802684

Change 802683 merged by jenkins-bot:

[operations/software/tools-webservice@master] proxy: horrible hack for T309821

https://gerrit.wikimedia.org/r/802683

Change 802684 merged by jenkins-bot:

[operations/software/tools-webservice@master] d/changelog: Prepare for 0.85 release

https://gerrit.wikimedia.org/r/802684

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T03:10:37Z] <bd808> publish tools-webservice 0.85 with hack for T309821

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T03:21:45Z] <bd808> Cleared queue error states after deploying new toolforge-webservice package (T309821)

Things are a little better after deploying the hacked version tools-webservice. Many more webservices are running, but certainly not all. Some nodes are still being overloaded too with jobs failing due to "fork failed: Cannot allocate memory" and similar errors.

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T05:05:21Z] <taavi> removing duplicate (there should be only one per tool) web service jobs from the grid T309821

I suspect webservicemonitor somehow caused a lot of tools to have multiple instances running/queued. Removing those duplicates with a simple script:

import requests
import subprocess
import sys
import time

jobs = requests.get("https://sge-status.toolforge.org/api/v1").json()["data"]["attributes"]

tools = []

for host, host_data in jobs.items():
	if "web" not in host:
		continue
	if host_data["jobs"] == []:
		continue
	for job in host_data["jobs"].values():
		if job["job_owner"] in tools:
			print("removing duplicate job for", job["job_owner"])
			subprocess.check_call(
				[
					"/usr/bin/qdel",
					str(job["num"])
				],
				stdout=sys.stdout,
				stderr=sys.stderr,
			)
			time.sleep(1)
			continue

		tools.append(job["job_owner"])
08:34:51 <taavi> so I think what happened is that something got stuck, and then webservicemonitor (now disabled) made the problem worse by spawning tons of duplicate jobs
08:35:37 <taavi> I'm trying to make it better by deleting those duplicates, but the grid isn't actually deleting jobs marked as to be deleted
08:36:10 <taavi> we might need to do a rolling reboot + force delete for the entire web grid

I did a rolling reboot of all the Buster webgrid nodes, and deleted the jobs stuck in 'deleting' state. I /think/ the grid itself is now working properly again, individual web service might need restarts to start working again.

webservicemonitor is still disabled on tools-sgecron-01. Not yet sure what to do about it.

I believe the root cause is that some of the buster webgrid nodes got overloaded causing the sge_execd on that node to freeze. The grid for some reason stopped scheduling new jobs on the webgrid queues after that, and webservicemonitor made the problem worse since it though that the jobs in qw didn't exist and spawned new ones for web services it thought needed to be restarted.

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T12:46:50Z] <taavi> start webservicemonitor on tools-sgecron-01 T309821

Change 802620 had a related patch set uploaded (by BryanDavis; author: BryanDavis):

[operations/software/tools-webservice@master] Revert "proxy: horrible hack for T309821"

https://gerrit.wikimedia.org/r/802620

Change 802620 merged by jenkins-bot:

[operations/software/tools-webservice@master] Revert "proxy: horrible hack for T309821"

https://gerrit.wikimedia.org/r/802620

Change 802773 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/software/tools-webservice@master] d/changelog: Prepare for 0.86 release

https://gerrit.wikimedia.org/r/802773

Change 802773 merged by jenkins-bot:

[operations/software/tools-webservice@master] d/changelog: Prepare for 0.86 release

https://gerrit.wikimedia.org/r/802773

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:17:52Z] <bd808> publish tools-webservice 0.86 (T309821)

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:20:33Z] <bd808> publish tools-webservice 0.86 (T309821)

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T13:25:14Z] <bd808> Upgrading fleet to tools-webservice 0.86 (T309821)

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T15:50:47Z] <balloons> temp add 1.0G swap to sgeweblight hosts T309821

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T15:50:53Z] <balloons> fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor T309821

Upon further review, after the new buster hosts began acting up again, with OOM errors, even though free -m showed memory. After investigation, dcaro noted that there was a 24Mb swap partition on each. Adding temporary 1G swap space seems to have removed the errors.

The 24Mb swap was a typo from the more typical 24G swap found on other hosts. The flavor was updated to use 8G swap and 8G RAM. Nodes will be updated to utilize the new flavor.

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T18:59:59Z] <balloons> depooled old nodes, bringing entirely new grid of nodes online T309821

Mentioned in SAL (#wikimedia-cloud) [2022-06-03T19:51:35Z] <balloons> Scaling webservice nodes to 20, using new 8G swap flavor T309821

We have rebuilt 21 new lighttpd nodes, each with 8G of swap space (via a new flavor.)

Assuming this was an overprovisioning problem and not an invisible-memory-leak problem, things should be better. After a period of peace we can consider removing some of those nodes.

nskaggs claimed this task.

I'm going to try and set expectations here and say this part of the incident is closed / resolved. Hopefully we don't have to re-open!