Page MenuHomePhabricator

Zuul is no longer adding jobs to any jenkins pipelines
Closed, ResolvedPublic

Description

At least the following changes were +2ed, are not based on any other unmerged changes, and don’t have a Depends-On trailer blocking them, but didn’t get added to the gate-and-submit pipeline:

This must have happened relatively recently, two hours ago it was definitely still working.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 2 2019, 4:28 PM
Lucas_Werkmeister_WMDE triaged this task as High priority.Jul 2 2019, 4:29 PM
Jdforrester-WMF raised the priority of this task from High to Unbreak Now!.Jul 2 2019, 4:29 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 2 2019, 4:29 PM

Mentioned in SAL (#wikimedia-operations) [2019-07-02T16:30:37Z] <hashar> CI code-review +2 changes are not quite processed for some unknown reason T227111

hashar added a comment.Jul 2 2019, 4:33 PM

I am looking into it but I don't know yet what is happening. At least some jobs are running in Jenkins and Zuul does process some events. So it is not entirely broken.

Other pipelines are also not getting filled – this probably isn’t specific to gate-and-submit, that’s just where it was more obvious.

Let’s add “shared build failure” since that’s where people might look for “why are my changes not merging”, even though strictly speaking I suppose it’s not a “failure” as such.

Paladox added a subscriber: Paladox.Jul 2 2019, 4:39 PM

I am seeing this:

com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Cannot open ReviewDb
at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:69) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
while locating com.google.gerrit.reviewdb.server.ReviewDb
for the 1st parameter of com.google.gerrit.server.query.change.OutputStreamQuery.<init>(OutputStreamQuery.java:106)
while locating com.google.gerrit.server.query.change.OutputStreamQuery
for field at com.google.gerrit.sshd.commands.Query.processor(Query.java:27)
while locating com.google.gerrit.sshd.commands.Query
while locating org.apache.sshd.server.Command annotated with CommandName[gerrit query]
1 error
      at com.google.inject.internal.InternalProvisionException.toProvisionException(InternalProvisionException.java:226)
      at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1053)
      at com.google.gerrit.sshd.DispatchCommand.start(DispatchCommand.java:95)
      at com.google.gerrit.sshd.DispatchCommand.start(DispatchCommand.java:122)
      at com.google.gerrit.sshd.CommandFactoryProvider$Trampoline.onStart(CommandFactoryProvider.java:208)
      at com.google.gerrit.sshd.CommandFactoryProvider$Trampoline.access$300(CommandFactoryProvider.java:111)
      at com.google.gerrit.sshd.CommandFactoryProvider$Trampoline$1.run(CommandFactoryProvider.java:167)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:558)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.gwtorm.server.OrmException: Cannot open database connection
      at com.google.gwtorm.jdbc.Database.newConnection(Database.java:130)
      at com.google.gwtorm.jdbc.JdbcSchema.<init>(JdbcSchema.java:43)
      at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$13.<init>(Unknown Source)
      at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$13_Factory_GwtOrm$$14.open(Unknown Source)
      at com.google.gwtorm.jdbc.Database.open(Database.java:122)
      at com.google.gerrit.server.schema.NotesMigrationSchemaFactory.open(NotesMigrationSchemaFactory.java:39)
      at com.google.gerrit.server.schema.NotesMigrationSchemaFactory.open(NotesMigrationSchemaFactory.java:25)
      at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:46)
      at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:27)
      at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:69)
      at com.google.gerrit.server.util.ThreadLocalRequestContext$1$$FastClassByGuice$$75e0eb90.invoke(<generated>)
      at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264)
      at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:173)
      at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)
      at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)
      at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:42)
      at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:65)
      at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:113)
      at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
      at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:306)
      at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:52)
      at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:147)
      at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:124)
      at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
      at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:306)
      at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:62)
      at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1050)
      ... 13 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 40 milliseconds ago. The last packet sent successfully to the server was 40 milliseconds ago.
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
      at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:919)
      at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1694)
      at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1244)
      at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2397)
      at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2430)
      at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2215)
      at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813)
      at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
      at sun.reflect.GeneratedConstructorAccessor71.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
      at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399)
      at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334)
      at com.google.gwtorm.jdbc.SimpleDataSource.getConnection(SimpleDataSource.java:104)
      at com.google.gwtorm.jdbc.Database.newConnection(Database.java:128)
      ... 39 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
      at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
      ... 57 more
WARN  [com.google.gerrit.sshd.CommandFactoryProvider] Cannot start command "gerrit query --format json --commit-message --current-patch-set change:I13f476d8126f81b0417e7509784c83d4f21cf348" for user jenkins-bot
02/07/19 16:01:53
hashar added a comment.Jul 2 2019, 4:42 PM

gerrit stream-events works locally.

The zuul scheduler and the couple zuul merger have some kind of activity.

Jdforrester-WMF renamed this task from Zuul is no longer adding +2ed changes to the gate-and-submit pipeline to Zuul is no longer adding jobs to any jenkins pipelines.Jul 2 2019, 4:43 PM
hashar added a comment.Jul 2 2019, 4:47 PM

Only thing I can imagine is that hmm Zuul lost its connection to Gerrit some how :-\

hashar added a comment.Jul 2 2019, 4:49 PM

Thread dump

12019-07-02 16:48:26,748 DEBUG zuul.stack_dump: Thread: 139978193487616
2 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
3 self.__bootstrap_inner()
4 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
5 self.run()
6 File "/usr/lib/python2.7/threading.py", line 763, in run
7 self.__target(*self.__args, **self.__kwargs)
8 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
9 runnable = self.queue.get()
10 File "/usr/lib/python2.7/Queue.py", line 168, in get
11 self.not_empty.wait()
12 File "/usr/lib/python2.7/threading.py", line 340, in wait
13 waiter.acquire()
14Thread: 139978185094912
15 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
16 self.__bootstrap_inner()
17 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
18 self.run()
19 File "/usr/lib/python2.7/threading.py", line 763, in run
20 self.__target(*self.__args, **self.__kwargs)
21 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
22 runnable = self.queue.get()
23 File "/usr/lib/python2.7/Queue.py", line 168, in get
24 self.not_empty.wait()
25 File "/usr/lib/python2.7/threading.py", line 340, in wait
26 waiter.acquire()
27Thread: 139977681794816
28 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
29 self.__bootstrap_inner()
30 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
31 self.run()
32 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 128, in run
33 self._handleEvent()
34 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 116, in _handleEvent
35 event.change_number, event.patch_number, refresh=True)
36 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
37 self._updateChange(change, history)
38 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
39 dep = self._getChange(dep_num, dep_ps, refresh=True)
40 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
41 self._updateChange(change, history)
42 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
43 dep = self._getChange(dep_num, dep_ps, refresh=True)
44 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
45 self._updateChange(change, history)
46 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
47 dep = self._getChange(dep_num, dep_ps, refresh=True)
48 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
49 self._updateChange(change, history)
50 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
51 dep = self._getChange(dep_num, dep_ps, refresh=True)
52 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
53 self._updateChange(change, history)
54 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
55 dep = self._getChange(dep_num, dep_ps, refresh=True)
56 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
57 self._updateChange(change, history)
58 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
59 dep = self._getChange(dep_num, dep_ps, refresh=True)
60 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
61 self._updateChange(change, history)
62 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
63 dep = self._getChange(dep_num, dep_ps, refresh=True)
64 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
65 self._updateChange(change, history)
66 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
67 dep = self._getChange(dep_num, dep_ps, refresh=True)
68 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
69 self._updateChange(change, history)
70 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
71 dep = self._getChange(dep_num, dep_ps, refresh=True)
72 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
73 self._updateChange(change, history)
74 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
75 dep = self._getChange(dep_num, dep_ps, refresh=True)
76 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
77 self._updateChange(change, history)
78 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
79 dep = self._getChange(dep_num, dep_ps, refresh=True)
80 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
81 self._updateChange(change, history)
82 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 345, in _updateChange
83 dep = self._getChange(dep_num, dep_ps, refresh=True)
84 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 173, in _getChange
85 self._updateChange(change, history)
86 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 305, in _updateChange
87 change):
88 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/source/gerrit.py", line 210, in _getDependsOnFromCommit
89 records.extend(self.connection.simpleQuery(query))
90 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 348, in simpleQuery
91 chunk, more_changes = _query_chunk(query)
92 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 320, in _query_chunk
93 out, err = self._ssh(cmd)
94 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 388, in _ssh
95 out = stdout.read()
96 File "/usr/lib/python2.7/dist-packages/paramiko/file.py", line 150, in read
97 new_data = self._read(self._DEFAULT_BUFSIZE)
98 File "/usr/lib/python2.7/dist-packages/paramiko/channel.py", line 1217, in _read
99 return self.channel.recv(size)
100 File "/usr/lib/python2.7/dist-packages/paramiko/channel.py", line 596, in recv
101 out = self.in_buffer.read(nbytes, self.timeout)
102 File "/usr/lib/python2.7/dist-packages/paramiko/buffered_pipe.py", line 147, in read
103 self._cv.wait(timeout)
104 File "/usr/lib/python2.7/threading.py", line 340, in wait
105 waiter.acquire()
106Thread: 139978755536640
107 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
108 self.__bootstrap_inner()
109 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
110 self.run()
111 File "/usr/lib/python2.7/threading.py", line 763, in run
112 self.__target(*self.__args, **self.__kwargs)
113 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 830, in _doPollLoop
114 self._pollLoop()
115 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 856, in _pollLoop
116 ret = poll.poll()
117Thread: 139979045463808
118 File "/usr/bin/zuul-server", line 10, in <module>
119 sys.exit(main())
120 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/server.py", line 239, in main
121 server.main()
122 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/server.py", line 211, in main
123 signal.pause()
124 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/__init__.py", line 40, in stack_dump_handler
125 log_str += "".join(traceback.format_stack(stack_frame))
126Thread: 139978705180416
127 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
128 self.__bootstrap_inner()
129 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
130 self.run()
131 File "/usr/lib/python2.7/threading.py", line 763, in run
132 self.__target(*self.__args, **self.__kwargs)
133 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
134 runnable = self.queue.get()
135 File "/usr/lib/python2.7/Queue.py", line 168, in get
136 self.not_empty.wait()
137 File "/usr/lib/python2.7/threading.py", line 340, in wait
138 waiter.acquire()
139Thread: 139978161956608
140 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
141 self.__bootstrap_inner()
142 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
143 self.run()
144 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 204, in run
145 self._run()
146 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 180, in _run
147 self._listen(stdout, stderr)
148 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 160, in _listen
149 ret = poll.poll(self.poll_timeout)
150Thread: 139978713573120
151 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
152 self.__bootstrap_inner()
153 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
154 self.run()
155 File "/usr/lib/python2.7/threading.py", line 763, in run
156 self.__target(*self.__args, **self.__kwargs)
157 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
158 runnable = self.queue.get()
159 File "/usr/lib/python2.7/Queue.py", line 168, in get
160 self.not_empty.wait()
161 File "/usr/lib/python2.7/threading.py", line 340, in wait
162 waiter.acquire()
163Thread: 139978721965824
164 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
165 self.__bootstrap_inner()
166 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
167 self.run()
168 File "/usr/lib/python2.7/threading.py", line 763, in run
169 self.__target(*self.__args, **self.__kwargs)
170 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
171 runnable = self.queue.get()
172 File "/usr/lib/python2.7/Queue.py", line 168, in get
173 self.not_empty.wait()
174 File "/usr/lib/python2.7/threading.py", line 340, in wait
175 waiter.acquire()
176Thread: 139978210273024
177 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
178 self.__bootstrap_inner()
179 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
180 self.run()
181 File "/usr/lib/python2.7/threading.py", line 763, in run
182 self.__target(*self.__args, **self.__kwargs)
183 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
184 runnable = self.queue.get()
185 File "/usr/lib/python2.7/Queue.py", line 168, in get
186 self.not_empty.wait()
187 File "/usr/lib/python2.7/threading.py", line 340, in wait
188 waiter.acquire()
189Thread: 139978201880320
190 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
191 self.__bootstrap_inner()
192 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
193 self.run()
194 File "/usr/lib/python2.7/threading.py", line 763, in run
195 self.__target(*self.__args, **self.__kwargs)
196 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
197 runnable = self.queue.get()
198 File "/usr/lib/python2.7/Queue.py", line 168, in get
199 self.not_empty.wait()
200 File "/usr/lib/python2.7/threading.py", line 340, in wait
201 waiter.acquire()
202Thread: 139977631590144
203 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
204 self.__bootstrap_inner()
205 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
206 self.run()
207 File "/usr/lib/python2.7/threading.py", line 763, in run
208 self.__target(*self.__args, **self.__kwargs)
209 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 830, in _doPollLoop
210 self._pollLoop()
211 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 856, in _pollLoop
212 ret = poll.poll()
213Thread: 139977639982848
214 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
215 self.__bootstrap_inner()
216 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
217 self.run()
218 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/webapp.py", line 61, in run
219 self.server.serve_forever()
220 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 1084, in serve_forever
221 self.handle_request()
222 File "/usr/lib/python2.7/SocketServer.py", line 276, in handle_request
223 fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
224 File "/usr/lib/python2.7/SocketServer.py", line 155, in _eintr_retry
225 return func(*args)
226Thread: 139977669089024
227 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
228 self.__bootstrap_inner()
229 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
230 self.run()
231 File "/usr/lib/python2.7/dist-packages/paramiko/transport.py", line 1627, in run
232 ptype, m = self.packetizer.read_message()
233 File "/usr/lib/python2.7/dist-packages/paramiko/packet.py", line 341, in read_message
234 header = self.read_all(self.__block_size_in, check_rekey=True)
235 File "/usr/lib/python2.7/dist-packages/paramiko/packet.py", line 204, in read_all
236 x = self.__socket.recv(n)
237Thread: 139978218665728
238 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
239 self.__bootstrap_inner()
240 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
241 self.run()
242 File "/usr/lib/python2.7/threading.py", line 763, in run
243 self.__target(*self.__args, **self.__kwargs)
244 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
245 runnable = self.queue.get()
246 File "/usr/lib/python2.7/Queue.py", line 168, in get
247 self.not_empty.wait()
248 File "/usr/lib/python2.7/threading.py", line 340, in wait
249 waiter.acquire()
250Thread: 139978747143936
251 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
252 self.__bootstrap_inner()
253 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
254 self.run()
255 File "/usr/lib/python2.7/threading.py", line 763, in run
256 self.__target(*self.__args, **self.__kwargs)
257 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 744, in _doConnectLoop
258 self.connections_condition.wait()
259 File "/usr/lib/python2.7/threading.py", line 340, in wait
260 waiter.acquire()
261Thread: 139977228015360
262 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
263 self.__bootstrap_inner()
264 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
265 self.run()
266 File "/usr/lib/python2.7/threading.py", line 763, in run
267 self.__target(*self.__args, **self.__kwargs)
268 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
269 runnable = self.queue.get()
270 File "/usr/lib/python2.7/Queue.py", line 168, in get
271 self.not_empty.wait()
272 File "/usr/lib/python2.7/threading.py", line 340, in wait
273 waiter.acquire()
274Thread: 139977279141632
275 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
276 self.__bootstrap_inner()
277 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
278 self.run()
279 File "/usr/lib/python2.7/threading.py", line 763, in run
280 self.__target(*self.__args, **self.__kwargs)
281 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/rpclistener.py", line 68, in run
282 job = self.worker.getJob()
283 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 1898, in getJob
284 job = self.job_queue.get()
285 File "/usr/lib/python2.7/Queue.py", line 168, in get
286 self.not_empty.wait()
287 File "/usr/lib/python2.7/threading.py", line 340, in wait
288 waiter.acquire()
289Thread: 139978738751232
290 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
291 self.__bootstrap_inner()
292 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
293 self.run()
294 File "/usr/lib/python2.7/dist-packages/paramiko/transport.py", line 1627, in run
295 ptype, m = self.packetizer.read_message()
296 File "/usr/lib/python2.7/dist-packages/paramiko/packet.py", line 341, in read_message
297 header = self.read_all(self.__block_size_in, check_rekey=True)
298 File "/usr/lib/python2.7/dist-packages/paramiko/packet.py", line 204, in read_all
299 x = self.__socket.recv(n)
300Thread: 139977623197440
301 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
302 self.__bootstrap_inner()
303 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
304 self.run()
305 File "/usr/lib/python2.7/threading.py", line 763, in run
306 self.__target(*self.__args, **self.__kwargs)
307 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 744, in _doConnectLoop
308 self.connections_condition.wait()
309 File "/usr/lib/python2.7/threading.py", line 340, in wait
310 waiter.acquire()
311Thread: 139978979469056
312 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
313 self.__bootstrap_inner()
314 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
315 self.run()
316 File "/usr/lib/python2.7/threading.py", line 763, in run
317 self.__target(*self.__args, **self.__kwargs)
318 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 744, in _doConnectLoop
319 self.connections_condition.wait()
320 File "/usr/lib/python2.7/threading.py", line 340, in wait
321 waiter.acquire()
322Thread: 139978170349312
323 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
324 self.__bootstrap_inner()
325 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
326 self.run()
327 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 968, in run
328 self.wake_event.wait()
329 File "/usr/lib/python2.7/threading.py", line 621, in wait
330 self.__cond.wait(timeout)
331 File "/usr/lib/python2.7/threading.py", line 340, in wait
332 waiter.acquire()
333Thread: 139978971076352
334 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
335 self.__bootstrap_inner()
336 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
337 self.run()
338 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/launcher/gearman.py", line 47, in run
339 self.wake_event.wait(300)
340 File "/usr/lib/python2.7/threading.py", line 621, in wait
341 self.__cond.wait(timeout)
342 File "/usr/lib/python2.7/threading.py", line 359, in wait
343 _sleep(delay)
344Thread: 139977270748928
345 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
346 self.__bootstrap_inner()
347 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
348 self.run()
349 File "/usr/lib/python2.7/threading.py", line 763, in run
350 self.__target(*self.__args, **self.__kwargs)
351 File "/usr/lib/python2.7/dist-packages/paste/httpserver.py", line 866, in worker_thread_callback
352 runnable = self.queue.get()
353 File "/usr/lib/python2.7/Queue.py", line 168, in get
354 self.not_empty.wait()
355 File "/usr/lib/python2.7/threading.py", line 340, in wait
356 waiter.acquire()
357Thread: 139978987861760
358 File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
359 self.__bootstrap_inner()
360 File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
361 self.run()
362 File "/usr/lib/python2.7/threading.py", line 763, in run
363 self.__target(*self.__args, **self.__kwargs)
364 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 830, in _doPollLoop
365 self._pollLoop()
366 File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gear/__init__.py", line 856, in _pollLoop
367 ret = poll.poll()
368

Mentioned in SAL (#wikimedia-operations) [2019-07-02T16:52:15Z] <hashar> Stopping Jenkins and Zuul T227111

Only thing I can imagine is that hmm Zuul lost its connection to Gerrit some how :-\

I see two connections currently from jenkins-bot, which seems normal.

(/^ヮ^)/*:・゚✧ ssh -p 29418 gerrit.wikimedia.org -- gerrit show-connections -w
Session    User            Remote Host
--------------------------------------------------------------
f90a2155   jenkins-bot     contint1001.wikimedia.org
e6d062c5   jenkins-bot     contint1001.wikimedia.org

Mentioned in SAL (#wikimedia-operations) [2019-07-02T16:55:06Z] <hashar> Starting Jenkins and Zuul T227111

greg added a subscriber: greg.Jul 2 2019, 4:57 PM
09:47:13 <James_F> I just got a large number of e-mails from gerrit all at once.
09:47:19 <James_F> Possibly a deadlock got resolved?
Lucas_Werkmeister_WMDE lowered the priority of this task from Unbreak Now! to High.Jul 2 2019, 4:57 PM

Seems to be working again on https://gerrit.wikimedia.org/r/520252.

hashar added a comment.Jul 2 2019, 4:59 PM

So something somehow send a huge dependency loop to TimedMediaHandler and Zuul got confused. I noticed a spike of Gearman jobs at 14:50 immediately followed by a huge one from 15:00 to 15:20.

https://grafana.wikimedia.org/d/000000322/zuul-gearman

I have stopped Jenkins first then Zuul. Zuul log was showing lot of errors submitting merger:update Gearma jobs for mediawiki/extensions/TimedMediaHandler.

CI no more blocked, solved by hard restarting Zuul

09:47:13 <James_F> I just got a large number of e-mails from gerrit all at once.
09:47:19 <James_F> Possibly a deadlock got resolved?

Looking at @Paladox 's paste. There was a hiccup in the sql connection somewhere:

Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
      at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)

Noteworthy that this happened during a command zuul was trying to execute.

Mentioned in SAL (#wikimedia-operations) [2019-07-02T16:59:56Z] <hashar> CI is back, I had to restart Zuul :-\ T227111

09:47:13 <James_F> I just got a large number of e-mails from gerrit all at once.
09:47:19 <James_F> Possibly a deadlock got resolved?

There were no blocking threads or deadlocks in gerrit at 9:40 : https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMi8tLWpzdGFjay0xOS0wNy0wMi0xNi00MC0wMy5kdW1wLS0xNy0xMC00Mw==

Looking at @Paladox 's paste. There was a hiccup in the sql connection somewhere:

Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
      at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)

Noteworthy that this happened during a command zuul was trying to execute.

This happened at 2019-07-02 16:00:11,313.

hashar closed this task as Resolved.Jul 2 2019, 5:15 PM
hashar claimed this task.

The root cause is some changes for AbuseFilter. There is a serie of patches with lot of depends-on headers accross a lot of repositories and that ends up apparently overflowing. I think Zuul was just fine, it was just in a long busy loop to update all the repositories involved and retrieving all the changes from Gerrit :-\

Solved by hard stopping Zuul.

It is partly related to lack of an upstream patch. I did try to backport it but without success since lot of code got refactored meanwhile (the faulty attempt was https://gerrit.wikimedia.org/r/#/c/integration/zuul/+/508390/ ). See also private task T140297.

T140297