Page MenuHomePhabricator

CI job fails docker: open /var/lib/docker/tmp/GetImageBlob990376239: no such file or directory
Closed, ResolvedPublicPRODUCTION ERROR

Description

mediawiki-quibble-composertest-php70-docker

04:40:19 Started by user unknown or anonymous
04:40:19 Building remotely on integration-slave-docker-1055 (blubber instance-type-bigram DebianJessieDocker m4executor) in workspace /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker
04:40:20 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash -xe /tmp/jenkins445418393964771885.sh
04:40:20 + mkdir -m 2777 -p cache
04:40:20 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash /tmp/jenkins5153419857677147712.sh
04:40:20 ++ pwd
04:40:20 + exec docker run --volume /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker/cache:/cache --init --rm --label jenkins.job=mediawiki-quibble-composertest-php70-docker --label jenkins.build=17924 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/castor:0.2.0 load
04:40:20 ++ /usr/bin/env
04:40:20 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)='
04:40:20 Unable to find image 'docker-registry.wikimedia.org/releng/castor:0.2.0' locally
04:40:21 0.2.0: Pulling from releng/castor
04:40:21 956dfecac10b: Pulling fs layer
04:40:21 7102cea7e37c: Pulling fs layer
04:40:21 f32f5848fd00: Pulling fs layer
04:40:21 4688ba5289c8: Pulling fs layer
04:40:21 f16efc186678: Pulling fs layer
04:40:21 4688ba5289c8: Waiting
04:40:21 f16efc186678: Waiting
04:40:21 docker: open /var/lib/docker/tmp/GetImageBlob682696395: no such file or directory.
04:40:21 See 'docker run --help'.
04:40:21 Build step 'Execute shell' marked build as failure
04:40:23 Archiving artifacts
04:40:24 [PostBuildScript] - Execution post build scripts.
04:40:24 [PostBuildScript] Build is not success : do not execute script
04:40:24 [PostBuildScript] - Execution post build scripts.
04:40:24 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash -xe /tmp/jenkins722703681368070046.sh
04:40:24 + echo 'Clearing /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker/cache'
04:40:24 Clearing /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker/cache
04:40:24 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash /tmp/jenkins1120397283963711162.sh
04:40:24 ++ pwd
04:40:24 + exec docker run --volume /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker/cache:/cache --init --rm --label jenkins.job=mediawiki-quibble-composertest-php70-docker --label jenkins.build=17924 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/castor:0.2.0 clear
04:40:24 ++ /usr/bin/env
04:40:24 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)='
04:40:24 Unable to find image 'docker-registry.wikimedia.org/releng/castor:0.2.0' locally
04:40:25 0.2.0: Pulling from releng/castor
04:40:25 956dfecac10b: Pulling fs layer
04:40:25 7102cea7e37c: Pulling fs layer
04:40:25 f32f5848fd00: Pulling fs layer
04:40:25 4688ba5289c8: Pulling fs layer
04:40:25 f16efc186678: Pulling fs layer
04:40:25 4688ba5289c8: Waiting
04:40:25 f16efc186678: Waiting
04:40:25 docker: open /var/lib/docker/tmp/GetImageBlob783615918: no such file or directory.
04:40:25 See 'docker run --help'.
04:40:25 Build step 'Execute a set of scripts' marked build as failure
04:40:25 [PostBuildScript] - Execution post build scripts.
04:40:25 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash -xe /tmp/jenkins6034760054668372983.sh
04:40:25 + set -euxo pipefail
04:40:25 + docker ps -q --filter label=jenkins.job=mediawiki-quibble-composertest-php70-docker --filter label=jenkins.build=17924
04:40:25 + xargs --no-run-if-empty docker stop
04:40:25 [PostBuildScript] - Execution post build scripts.
04:40:25 [mediawiki-quibble-composertest-php70-docker] $ /bin/bash /tmp/jenkins6958657851077379558.sh
04:40:25 + exec docker run --user=root --volume /srv/jenkins-workspace/workspace/mediawiki-quibble-composertest-php70-docker:/workspace --entrypoint=/usr/bin/find --init --rm --label jenkins.job=mediawiki-quibble-composertest-php70-docker --label jenkins.build=17924 --env-file /dev/fd/63 docker-registry.wikimedia.org/wikimedia-stretch:latest /workspace -mindepth 1 -delete
04:40:25 ++ /usr/bin/env
04:40:25 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)='
04:40:25 Unable to find image 'docker-registry.wikimedia.org/wikimedia-stretch:latest' locally
04:40:26 latest: Pulling from wikimedia-stretch
04:40:26 06d0c44411ac: Pulling fs layer
04:40:26 docker: open /var/lib/docker/tmp/GetImageBlob990376239: no such file or directory.
04:40:26 See 'docker run --help'.
04:40:26 Build step 'Execute a set of scripts' marked build as failure
04:40:26 Finished: FAILURE

Event Timeline

Kizule triaged this task as High priority.Apr 30 2019, 3:05 AM
Ladsgroup raised the priority of this task from High to Unbreak Now!.Apr 30 2019, 5:59 AM
Ladsgroup subscribed.

Aand https://gerrit.wikimedia.org/r/c/mediawiki/core/+/506945
It's almost impossible to merge anything with this.

It looks like this might be blocking config merges as well – operations-mw-config-typos-docker#4346 for Ied0ada0ec0:

13:01:51 Started by user unknown or anonymous
13:01:51 Building remotely on integration-slave-docker-1055 (blubber instance-type-bigram DebianJessieDocker m4executor) in workspace /srv/jenkins-workspace/workspace/operations-mw-config-typos-docker
13:01:51 [operations-mw-config-typos-docker] $ /bin/bash /tmp/jenkins1768537320233754565.sh
13:01:51 + exec docker run --init --rm --label jenkins.job=operations-mw-config-typos-docker --label jenkins.build=4346 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/typos:0.0.2
13:01:51 ++ /usr/bin/env
13:01:51 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)='
13:01:52 Unable to find image 'docker-registry.wikimedia.org/releng/typos:0.0.2' locally
13:01:53 0.0.2: Pulling from releng/typos
13:01:53 d397e275c51e: Pulling fs layer
13:01:53 a041ea3cae5b: Pulling fs layer
13:01:53 25a836845950: Pulling fs layer
13:01:53 docker: open /var/lib/docker/tmp/GetImageBlob571189260: no such file or directory.
13:01:53 See 'docker run --help'.
13:01:53 Build step 'Execute shell' marked build as failure
13:01:53 Finished: FAILURE

On the other hand, https://gerrit.wikimedia.org/r/507242 was still merged, over two hours after this task was first reported (unless I’m confused by timezones).

Okay, my config changes were now merged as well, so either this doesn’t happen all the time or it was just an unrelated random thing.

hashar renamed this task from mediawiki-quibble-composertest-php70-docker failure: Unable to find image 'docker-registry.wikimedia.org/releng/castor:0.2.0' locally to CI job fails docker: open /var/lib/docker/tmp/GetImageBlob990376239: no such file or directory.Apr 30 2019, 11:59 AM
hashar updated the task description. (Show Details)
hashar claimed this task.
hashar subscribed.

The actually issue is Docker failing to open some blob file under /var/lib/docker/tmp/

I think the issue only affected integration-slave-docker-1055 , which had the wrong disk partition layout. I thus had to get puppet to redo a /var/lib/docker partition but Docker was already running at that point. So I guess it pointed to files that are no more existing?

The Docker daemon on the instance had no images.

So I have restarted the Docker daemon and I managed to docker pull a few instances.

Root cause: I have missed restarting the Docker daemon after repartitioning the disk :(

And I have mailed wikitech-l to announce it to everyone. I apologize, I should really have tested the instance with a dummy job :/

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM