Page MenuHomePhabricator

Failed to create a temp file in beta-code-update-eqiad (Full deployment-bastion:/tmp)
Closed, ResolvedPublic

Description

https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/53401/console

21:13:00 FATAL: Unable to produce a script file
21:13:00 java.io.IOException: Failed to create a temp file on /mnt/home/jenkins-deploy/workspace/beta-code-update-eqiad
21:13:00 	at hudson.FilePath.createTextTempFile(FilePath.java:1373)
21:13:00 	at hudson.tasks.CommandInterpreter.createScriptFile(CommandInterpreter.java:144)
21:13:00 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
21:13:00 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
21:13:00 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
21:13:00 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770)
21:13:00 	at hudson.model.Build$BuildExecution.build(Build.java:199)
21:13:00 	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
21:13:00 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533)
21:13:00 	at hudson.model.Run.execute(Run.java:1759)
21:13:00 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
21:13:00 	at hudson.model.ResourceController.execute(ResourceController.java:89)
21:13:00 	at hudson.model.Executor.run(Executor.java:240)
21:13:00 Caused by: java.io.IOException: remote file operation failed: /mnt/home/jenkins-deploy/workspace/beta-code-update-eqiad at hudson.remoting.Channel@70b1a410:deployment-bastion.eqiad: java.io.IOException: No space left on device
21:13:00 	at hudson.FilePath.act(FilePath.java:977)
21:13:00 	at hudson.FilePath.act(FilePath.java:959)
21:13:00 	at hudson.FilePath.createTextTempFile(FilePath.java:1347)
21:13:00 	... 12 more
21:13:00 Caused by: java.io.IOException: No space left on device
21:13:00 	at java.io.FileOutputStream.writeBytes(Native Method)
21:13:00 	at java.io.FileOutputStream.write(FileOutputStream.java:345)
21:13:00 	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
21:13:00 	at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:316)
21:13:00 	at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
21:13:00 	at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
21:13:00 	at hudson.FilePath$17.invoke(FilePath.java:1366)
21:13:00 	at hudson.FilePath$17.invoke(FilePath.java:1347)
21:13:00 	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2677)
21:13:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:121)
21:13:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:49)
21:13:00 	at hudson.remoting.Request$2.run(Request.java:324)
21:13:00 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
21:13:00 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
21:13:00 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
21:13:00 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
21:13:00 	at java.lang.Thread.run(Thread.java:745)
21:13:00 	at ......remote call to deployment-bastion.eqiad(Native Method)
21:13:00 	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356)
21:13:00 	at hudson.remoting.UserResponse.retrieve(UserRequest.java:221)
21:13:00 	at hudson.remoting.Channel.call(Channel.java:752)
21:13:00 	at hudson.FilePath.act(FilePath.java:970)
21:13:00 	... 14 more
21:13:00 Build step 'Execute shell' marked build as failure

Event Timeline

greg created this task.Apr 26 2015, 9:17 PM
greg raised the priority of this task from to High.
greg updated the task description. (Show Details)
greg added subscribers: greg, mmodell, thcipriani and 2 others.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 26 2015, 9:17 PM

From what I've been seeing all weekend, this job keeps failing because deployment-bastion keeps running out of space on the root filesystem.

Scap creates temporary files for l10nupdate at /tmp/scap_l10n_[some_random_string] It seems like this line hasn't been getting called: https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/tasks.py#L441-L444

I've been deleting multi-gigabyte directories about of /tmp, but it seemingly keeps re-filling :(

greg added a comment.Apr 27 2015, 1:13 PM

Could be core dumps from T93194 ?

Krinkle renamed this task from Failed to create a temp file on /mnt/home/jenkins-deploy/workspace/beta-code-update-eqiad to Failed to create a temp file in beta-code-update-eqiad (Full deployment-bastion:/tmp).Apr 27 2015, 5:46 PM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Krinkle updated the task description. (Show Details)
$ du -sh /tmp
1.9M	mw-cache-master
1.2G	scap_l10n_1283037312
1.2G	scap_l10n_1909596178

Scap does remove these files after a successful run, but if it fails or is aborted it probably stays behind.

Proposal

  • Update the job to make use of standard global-setup.sh and global-teardown.sh (from slave-scripts). This will give the job its own $TMPDIR and also clear it after each job (in case anything is left).
  • Ensure scap itself honours the Linux convention of using $TMPDIR (not hardcode root /tmp).

Change 206850 had a related patch set uploaded (by Krinkle):
Add global-setup and global-teardown to beta-code-update

https://gerrit.wikimedia.org/r/206850

Change 206853 had a related patch set uploaded (by Krinkle):
contint: Move tmpfs and slave-scripts from slave::labs to slave::labs::common

https://gerrit.wikimedia.org/r/206853

Change 206856 had a related patch set uploaded (by Thcipriani):
Make scap localization cache build $TMPDIR aware

https://gerrit.wikimedia.org/r/206856

Change 206853 merged by Ori.livneh:
contint: Move jenkins/tmpfs from slave::labs to slave::labs::common

https://gerrit.wikimedia.org/r/206853

Change 206850 merged by jenkins-bot:
Add global-setup and global-teardown to beta-code-update

https://gerrit.wikimedia.org/r/206850

Change 206856 merged by jenkins-bot:
Make scap localization cache build $TMPDIR aware

https://gerrit.wikimedia.org/r/206856

thcipriani closed this task as Resolved.May 19 2015, 2:48 PM

Change 361883 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] beta-update-databases: invoke global-teardown directly

https://gerrit.wikimedia.org/r/361883

Change 361883 merged by jenkins-bot:
[integration/config@master] beta-update-databases: invoke global-teardown directly

https://gerrit.wikimedia.org/r/361883