Page MenuHomePhabricator

scap train-presync failed to push image: blob upload unknown
Closed, DuplicatePublic

Description

The train-presync task has failed over night with:

03:14:48 [mediawiki-publish] Pushing docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-030352-publish
03:43:21 [mediawiki-publish-81] blob upload unknown

That is most probably the same as T390251: docker-registry.wikimedia.org keeps serving bad blobs which is ongoing, I am filing an independent task for the purpose of blocking the train.

Logs

03:03:52 Started build-and-push-container-images
...
03:14:37 [mediawiki-publish] Building publish cli image
03:14:37 [mediawiki-publish] Running docker build -f mediawiki-cli/Dockerfile --iidfile /tmp/tmpvu5cagut --build-arg http_proxy=http://webproxy:8080 --build-arg https_proxy=http://webproxy:8080 --build-arg BASE=docker-registry.discovery.wmnet/restricted/mediawiki-multiversion:2025-04-15-030352-publish --build-arg PHP_VERSION=7.4 --label vnd.wikimedia.builder.name=scap --label vnd.wikimedia.builder.version=4.153.0 --label vnd.wikimedia.mediawiki.versions=1.44.0-wmf.24,1.44.0-wmf.25 --label vnd.wikimedia.scap.stage_dir=/srv/mediawiki-staging --label vnd.wikimedia.scap.build_state_dir=/srv/mediawiki-staging/scap/image-build empty
...
03:14:48 [mediawiki-publish]  ---> 471e4c0c54e7
03:14:48 [mediawiki-publish] Successfully built 471e4c0c54e7
03:14:48 [mediawiki-publish] Running docker tag sha256:471e4c0c54e7575d4d0c2e6b927fac64f9bf0d9986d647405a6918075415d578 docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-030352-publish
03:14:48 [mediawiki-publish] Pushing docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-030352-publish
03:14:48 [mediawiki-publish] Running sudo /usr/local/bin/docker-pusher -q docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-030352-publish
03:15:02 [mediawiki-publish] docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-030352-publish
03:43:21 [mediawiki-publish-81] blob upload unknown
03:43:21 [mediawiki-publish-81] Traceback (most recent call last):
  File "/srv/mwbuilder/release/make-container-image/app.py", line 145, in join
    future.result()
  File "usr/lib/python3.9/concurrent/futures_base.py", line 433, in result
    return self.__get_result()
  File "usr/lib/python3.9/concurrent/futures_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/srv/mwbuilder/release/make-container-image/build-images.py", line 434, in build
    mw_mv_image, mw_mv_debug_image, mw_mv_cli_image = app_instance.build_mediawiki_images(
  File "/srv/mwbuilder/release/make-container-image/build-images.py", line 238, in build_mediawiki_images
    image = build_image_incr.App(
  File "/srv/mwbuilder/release/make-container-image/build_image_incr.py", line 167, in run
    self.push_image(report["image"])
  File "/srv/mwbuilder/release/make-container-image/app.py", line 91, in push_image
    self.check_call(["sudo", "/usr/local/bin/docker-pusher", "-q", image_ref])
  File "/srv/mwbuilder/release/make-container-image/app.py", line 72, in check_call
    return subprocess.check_call(
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '/usr/local/bin/docker-pusher', '-q', 'docker-registry.discovery.wmnet/restricted/mediawiki-multiversion:2025-04-15-030352-publish-81']' returned non-zero exit status 1.

03:43:21 Finished build-and-push-container-images (duration: 39m 28s)

Event Timeline

hashar triaged this task as Unbreak Now! priority.Apr 15 2025, 8:30 AM
hashar updated the task description. (Show Details)

That is blocking the train, thus setting up Unbreak Now!

I tried running the presync again earlier today. The same problem seemed to happen, with the image push stuck for around 14m before I aborted manually:

09:01:25 Started build-and-push-container-images
09:01:25 K8s images build/push output redirected to /home/jnuche/scap-image-build-and-push-log
^C09:15:35 Finished build-and-push-container-images (duration: 14m 10s)
09:15:35 sync-world aborted: testwikis to 1.44.0-wmf.25  refs T386220 (duration: 14m 36s)
09:02:00 [mediawiki-publish] Running docker tag sha256:bdb79f7fdbac30891c42f54e859b28592acf20fdf46d50bad7bac03588096ca8 docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-090125-publish
09:02:00 [mediawiki-publish] Pushing docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-090125-publish
09:02:00 [mediawiki-publish] Running sudo /usr/local/bin/docker-pusher -q docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-090125-publish
09:02:15 [mediawiki-publish] docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:2025-04-15-090125-publish

Closed as a duplicate that, while ongoing, is not strictly a train blocker.

If folks run into this issue during a deployment, they should retry the scap command.