Page MenuHomePhabricator

Remote call on integration-agent-docker-1001 failed
Closed, ResolvedPublic

Description

Multiple jenkins failures with the mwext-php72-phan-docker job, eg https://integration.wikimedia.org/ci/job/mwext-php72-phan-docker/35233/console

11:53:36 <?xml version="1.0" encoding="ISO-8859-15"?>
11:53:36 <checkstyle version="6.5"/>
11:53:37 FATAL: Remote call on integration-agent-docker-1001 failed
11:53:37 Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to integration-agent-docker-1001
11:53:37 		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1737)
11:53:37 		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
11:53:37 		at hudson.remoting.Channel.call(Channel.java:951)
11:53:37 		at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1083)
11:53:37 		at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:510)
11:53:37 		at hudson.model.Run.execute(Run.java:1853)
11:53:37 		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:53:37 		at hudson.model.ResourceController.execute(ResourceController.java:97)
11:53:37 		at hudson.model.Executor.run(Executor.java:428)
11:53:37 java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
11:53:37 	at hudson.util.ProcessTree.get(ProcessTree.java:432)
11:53:37 	at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1100)
11:53:37 	at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1091)
11:53:37 	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
11:53:37 	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
11:53:37 	at hudson.remoting.Request$2.run(Request.java:369)
11:53:37 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
11:53:37 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
11:53:37 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
11:53:37 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
11:53:37 	at java.lang.Thread.run(Thread.java:748)
11:53:37 Caused: java.io.IOException: Remote call on integration-agent-docker-1001 failed
11:53:37 	at hudson.remoting.Channel.call(Channel.java:957)
11:53:37 	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1083)
11:53:37 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:510)
11:53:37 	at hudson.model.Run.execute(Run.java:1853)
11:53:37 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:53:37 	at hudson.model.ResourceController.execute(ResourceController.java:97)
11:53:37 	at hudson.model.Executor.run(Executor.java:428)

Event Timeline

Restricted Application added a project: User-DannyS712. · View Herald TranscriptWed, Feb 12, 7:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 moved this task from Unsorted to Reports on the User-DannyS712 board.

Mentioned in SAL (#wikimedia-releng) [2020-02-12T20:22:48Z] <James_F> Depooling integration-agent-docker-1001; it's being odd T245054

Jdforrester-WMF triaged this task as High priority.Wed, Feb 12, 8:23 PM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

I've de-pooled it for now.

Proposed: Re-pool, re-start, re-build, give up?

thcipriani added a subscriber: thcipriani.

Optimistically, could just be the agent on that machine having worked itself into a strange state.

Looking at htop, the biggest memory user was java -jar slave.jar which is the agent process. Will restart and monitor.

I brought the agent back online, it ran at least one job ok: https://integration.wikimedia.org/ci/job/mwext-php72-phan-seccheck-docker/35253/console

continuing to monitor.

thcipriani closed this task as Resolved.Wed, Feb 19, 9:30 PM

Calling this resolved. I killed the java -jar slave.jar on the agent and repooled. Problem seems to have gone away.