Page MenuHomePhabricator

Failed docker build leaves dangling container
Closed, ResolvedPublic

Description

On October 9th, I have manually removed 47 GBytes of dangling docker images (docker images prune -f). That cleaned 47GBytes of disk.

I suspect they are related to the service pipeline thing? Release Pipeline

Event Timeline

hashar created this task.Oct 16 2019, 5:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 16 2019, 5:17 PM
hashar triaged this task as Medium priority.Oct 16 2019, 5:19 PM
thcipriani renamed this task from contint1001 has lot of dangling Docker images to Release pipeline is creating/not cleaning intermediate dangling images.Apr 15 2020, 4:44 PM

The cause seems to be that the pipeline sometime leave some containers behind. I have pruned all images and containers on contint1001 a few days ago and right now I had one stopped container:

contint1001$ docker ps -a

CONTAINER ID        IMAGE               COMMAND                   CREATED             STATUS                    PORTS               NAMES
dc74b315089f        87764dba840c        "/bin/sh -c 'npm \"ru…"   42 hours ago        Exited (1) 42 hours ago                       elastic_mayer

That one had the command /bin/sh -c 'npm \"run-script\" \"build-all-portals\"'

Once I have got rid of it (docker container prune -f) that has let us get rid of all the dangling images (docker image prune -f).

So I guess the pipeline is missing a docker stop somewhere. The other jobs do it thanks to @dduvall patch 29ce25b9736e9f0faf01c6f08a132396ec73e376 (T198517: Quibble docker instance running on CI instance for 6 hours).

dancy added a subscriber: dancy.Jul 20 2020, 3:59 PM

Looking at contint1001 today (Mon 20 Jul 2020 03:47:28 PM UTC) I see:

dancy@contint1001:~$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                   CREATED             STATUS                   PORTS               NAMES
2f928c43c0bc        770544497f3e        "/bin/sh -c 'npm \"ru…"   2 weeks ago         Exited (1) 2 weeks ago                       lucid_curie
b086bcd450f6        68cdfaa397a6        "/bin/sh -c 'npm ins…"    3 weeks ago         Exited (1) 3 weeks ago                       zealous_dijkstra
30917e88e8ee        f90cf678284d        "/bin/sh -c 'npm ins…"    4 weeks ago         Exited (1) 4 weeks ago                       sad_albattani
2d4effb4ec86        c7a86cc3aa6e        "/bin/sh -c 'go \"get…"   4 weeks ago         Exited (1) 4 weeks ago                       pedantic_feistel
2631dbfd74ba        fe03896877e6        "/bin/sh -c 'go \"get…"   4 weeks ago         Exited (1) 4 weeks ago                       stoic_pasteur
e06c982e07cc        53987eaf4ff3        "/bin/sh -c 'npm \"ru…"   6 weeks ago         Exited (1) 6 weeks ago                       modest_hofstadter
739caaac7bdd        02bd3a73827a        "/bin/sh -c 'npm \"ru…"   6 weeks ago         Exited (1) 6 weeks ago                       unruffled_meitner

Inspection of the images indicates that they were each part of a (probably interrupted) docker build operation:

dancy@contint1001:~$ for image in $(docker ps -a | grep -v IMAGE | awk '{print $2}'); do echo Image $image; docker inspect $image | jq .[0].Created ; docker inspect $image | jq .[0].ContainerConfig.Cmd; done
Image 770544497f3e
"2020-07-06T09:44:26.620301663Z"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY --chown=65533:65533dir:5ea207200e087d7318f8efff05d89bb7fa313c8a3a913863fedded6d597fae20 in ./ "
]
Image 68cdfaa397a6
"2020-06-29T12:13:01.439650007Z"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY --chown=65533:65533multi:bae03ba76a65f1269348b8451b412614914c919f41e18aa3a29ec4dab0746ce7 in ./ "
]
Image f90cf678284d
"2020-06-19T19:50:29.881786592Z"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY --chown=65533:65533file:87b7e8743cda1f9e03f10d86dbb248793a507acecb1150fc429c260d08c5436d in ./ "
]
Image c7a86cc3aa6e
"2020-06-17T20:52:45.345288444Z"
[
  "/bin/sh",
  "-c",
  "#(nop) WORKDIR /srv/app"
]
Image fe03896877e6
"2020-06-17T20:45:13.302081095Z"
[
  "/bin/sh",
  "-c",
  "#(nop) ",
  "ENV GOPATH=/usr/share/gocode"
]
Image 53987eaf4ff3
"2020-06-08T15:59:25.314708296Z"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY --chown=65533:65533dir:d96ca28d76f5004e0f04b000277fb659b1b16e2540211dab613331175a87bb54 in src/ "
]
Image 02bd3a73827a
"2020-06-08T15:16:41.395938264Z"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY --chown=65533:65533dir:ca88c1862f6b9285c26eae2ac8492cff5fc564fa98413d7795da94cc606601b9 in src/ "
]
dancy added a comment.Jul 20 2020, 7:50 PM

I have confirmed that docker build will leave a container around if a build step fails.

dancy added a comment.Jul 20 2020, 7:54 PM

From docker build docs:

--force-rm=true|false
   Always remove intermediate containers, even after unsuccessful builds. The default is false.
dancy claimed this task.Jul 20 2020, 7:55 PM
dancy updated the task description. (Show Details)

Change 614851 had a related patch set uploaded (by Ahmon Dancy; owner: Ahmon Dancy):
[integration/pipelinelib@master] Prevent container leak if docker build fails

https://gerrit.wikimedia.org/r/614851

I ran this today :

dancy@contint1001:~$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
2f928c43c0bccc9c23065d24c4c8413fec9b0f69c9e18e9a0522f2abc5309562
b086bcd450f6f0bb55df60cecb9bbc31af7ceb3f95121ab2aac00bd33c8009fa
30917e88e8ee97fcdc238aa401cfc294064743afa6aab2091f63d0b7bf14f1b5
2d4effb4ec867abf9a040164e5f4d3dc4599a5f4b2c7a399e8843987daaf9abb
2631dbfd74babed7d61b194074d5075295593af7e61ec7efd094aeaffa98da41
e06c982e07cc529dd36a3beaa14564d06bd0829b7562e2f9c48be3073f8176f8
739caaac7bddd2d305bbd13878379b7363c160c0770389b490e47f9aa2858af2

Total reclaimed space: 173.8MB
dancy renamed this task from Release pipeline is creating/not cleaning intermediate dangling images to Failed docker build leaves dangling container.Jul 21 2020, 9:04 PM
dancy added a project: User-dancy.
dancy moved this task from Backlog to Awaiting review/merge on the User-dancy board.

Change 614851 merged by jenkins-bot:
[integration/pipelinelib@master] Prevent container leak if docker build fails

https://gerrit.wikimedia.org/r/614851

dancy closed this task as Resolved.Jul 22 2020, 2:42 AM
dancy removed a project: User-dancy.

The shell one liner caught all dangling containers. This task was originally for images and indeed they are build. The pipeline builds roam on both contint1001 and contint2001. I pruned all dangling containers AND images on both hosts.

Well done Dancy!

contint1001$ docker image prune -f
Deleted Images:
...
Total reclaimed space: 36.68GB