Page MenuHomePhabricator

Intermittent catalyst build failures for wikilambda with ERROR: Environment logs are still not ready.
Closed, ResolvedPublicBUG REPORT

Description

Catalyst builds fail intermittently when building for wikilambda with the error ERROR: Environment logs are still not ready. Failing here is the latest example.

Started by user unknown or anonymous
Running as SYSTEM
Building remotely on integration-agent-docker-1048 (pipelinelib Docker blubber) in workspace /srv/jenkins/workspace/wikilambda-catalyst-end-to-end
No emails were triggered.
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins1218348587985693863.sh
+ set +x
+ exec docker run --entrypoint=/usr/bin/install --user=root --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/bookworm:latest --directory --owner=nobody --group=nogroup /workspace/cache
++ set +x
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins4215399433067969887.sh
+ set +x
+ exec docker run --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/cache:/cache --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/castor:0.4.0 load
++ set +x
Defined: CASTOR_NAMESPACE="castor-mw-ext-and-skins/master/wikilambda-catalyst-end-to-end"
Syncing...

Done
[wikilambda-catalyst-end-to-end] $ /bin/bash -xe /tmp/jenkins12523748578464653678.sh
+ set -eux
+ mkdir -m 2777 -p log
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins15471069555128104626.sh
+ set +x
+ exec docker run --entrypoint=/usr/bin/find --user=nobody --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/bookworm:latest /workspace/log -mindepth 1 -delete
++ set +x
[wikilambda-catalyst-end-to-end] $ /bin/bash -xe /tmp/jenkins7753240871058165073.sh
+ set -eux
+ mkdir -m 2777 -p src
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins3157959982366783137.sh
+ set +x
+ exec docker run --entrypoint=/usr/bin/find --user=nobody --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/bookworm:latest /workspace/src -mindepth 1 -delete
++ set +x
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins12790119042572168353.sh
+ set +x
+ exec docker run --entrypoint=/usr/bin/install --user=root --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/bookworm:latest --directory --owner=nobody --group=nogroup /workspace/cache
++ set +x
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins9153318106067106503.sh
+ set +x
+ exec docker run --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/src:/src --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/cache:/cache --volume /srv/git:/srv/git:ro --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/ci-src-setup-simple:0.7.0-s1
++ set +x
+ [[ https://integration.wikimedia.org/ci/ == '' ]]
+ git init --initial-branch=master
Initialized empty Git repository in /src/.git/
+ git remote add origin https://gerrit.wikimedia.org/r/mediawiki/extensions/WikiLambda
+ git fetch --quiet --update-head-ok --depth 2 git://contint1002.wikimedia.org/mediawiki/extensions/WikiLambda +refs/zuul/master/Z9b5e13296fdd403cabdea2c9dee4dfbc:refs/zuul/master/Z9b5e13296fdd403cabdea2c9dee4dfbc
+ [[ master == '' ]]
+ git checkout -B master FETCH_HEAD
Reset branch 'master'
+ set +x
+ git submodule --quiet update --init --recursive
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins10977799128336890146.sh
+ set +x
+ exec docker run --entrypoint=/usr/bin/install --user=root --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/bookworm:latest --directory --owner=nobody --group=nogroup /workspace/cache
++ set +x
[wikilambda-catalyst-end-to-end] $ /bin/bash -eu /tmp/jenkins6306863623606305438.sh
+ chmod 2777 src
+ mkdir -m 2777 -p log
[wikilambda-catalyst-end-to-end] $ /bin/bash /tmp/jenkins8119691903288530095.sh
+ set +x
+ exec docker run --entrypoint=/deploy_env.py -e WIKILAMBDA_REF=81/1235081/1 -e ZUUL_CHANGE=1235081 -e ENV_API_PATH=https://api.catalyst.wmcloud.org/api/environments -e NPM_ARGS=selenium-test -e MEDIAWIKI_USER=Admin -e MEDIAWIKI_PASSWORD=dockerpass -e MW_SCRIPT_PATH=/w --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/src:/src --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/cache:/cache --volume /srv/jenkins/workspace/wikilambda-catalyst-end-to-end/log:/log --security-opt seccomp=unconfined --init --rm --label jenkins.job=wikilambda-catalyst-end-to-end --label jenkins.build=1877 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/catalyst:1.3.0-s1
++ set +x
INFO: Environment creation started. Streaming logs a soon as they are available
ERROR: Environment logs are still not ready. Failing
Build step 'Execute shell' marked build as failure

Event Timeline

vaughnwalters changed the subtype of this task from "Task" to "Bug Report".Jan 29 2026, 8:49 PM

Per some discussion, we're monitoring this to see if it's still a problem. We had a few incidents related to traffic volume recently.

One consequence of this, too, is that we're not cleaning up environments that are successfully created after this timeout fails the GitLab job—also something we'll monitor and find a workaround for.

Per some discussion, we're monitoring this to see if it's still a problem. We had a few incidents related to traffic volume recently.

Thanks tyler, here is the tail of the daily run from yesterday:
https://integration.wikimedia.org/ci/job/wikilambda-catalyst-end-to-end-daily/193/console

INFO: 2026-02-03T05:50:19.086703719Z: 	Successfully created Z20448.
INFO: 2026-02-03T05:50:19.302407801Z: 	Successfully created Z20739.
INFO: 2026-02-03T05:50:19.308918724Z: Updating secondary tables for 102 ZObjects of type Z8
INFO: 2026-02-03T05:50:24.948359845Z: ...site_stats is populated...done.
INFO: 2026-02-03T05:50:24.985776835Z: Checking existence of old default messages...done.
INFO: 2026-02-03T05:50:24.994848778Z: Adding empty categories with description pages...
INFO: 2026-02-03T05:50:24.996970607Z: Removing empty categories without description pages...
INFO: 2026-02-03T05:50:24.998391555Z: Category cleanup complete.
INFO: 2026-02-03T05:50:25.006259192Z: Fixing log entries with log_title starting with 'User:#'
INFO: 2026-02-03T05:50:25.007959098Z: done.
INFO: 2026-02-03T05:50:25.012036243Z: Skipped 34 updates that were already applied.
INFO: 2026-02-03T05:50:25.062724947Z: Purging caches...
INFO: 2026-02-03T05:50:25.062781255Z: Done in 4 min 41 s.
INFO: Creation logs completed. Will check for environment availability now
ERROR: Environment is still not ready. Last seen status is 'starting'
Build step 'Execute shell' marked build as failure

And here is another one from yesterday
https://integration.wikimedia.org/ci/job/wikilambda-catalyst-end-to-end/1885/console

with the tail (basically the same as above):

INFO: 2026-02-02T10:36:54.059159548Z: 	Successfully created Z20443.
INFO: 2026-02-02T10:36:54.230653679Z: 	Successfully created Z20448.
INFO: 2026-02-02T10:36:54.368398099Z: 	Successfully created Z20739.
INFO: 2026-02-02T10:36:54.373021564Z: Updating secondary tables for 102 ZObjects of type Z8
INFO: 2026-02-02T10:36:57.324572696Z: ...site_stats is populated...done.
INFO: 2026-02-02T10:36:57.345122822Z: Checking existence of old default messages...done.
INFO: 2026-02-02T10:36:57.349256070Z: Adding empty categories with description pages...
INFO: 2026-02-02T10:36:57.350011709Z: Removing empty categories without description pages...
INFO: 2026-02-02T10:36:57.350327341Z: Category cleanup complete.
INFO: 2026-02-02T10:36:57.354048498Z: Fixing log entries with log_title starting with 'User:#'
INFO: 2026-02-02T10:36:57.354614944Z: done.
INFO: 2026-02-02T10:36:57.359609599Z: Skipped 34 updates that were already applied.
INFO: 2026-02-02T10:36:57.394207955Z: Purging caches...
INFO: 2026-02-02T10:36:57.394277496Z: Done in 3 min 6 s.
INFO: Creation logs completed. Will check for environment availability now
ERROR: Environment is still not ready. Last seen status is 'starting'
Build step 'Execute shell' marked build as failure

Here's one from the daily selenium-test run last night:

https://integration.wikimedia.org/ci/job/wikilambda-catalyst-end-to-end-daily/197/console

the tail:

INFO: 2026-02-06T05:49:25.653784833Z: 	Successfully created Z20438.
INFO: 2026-02-06T05:49:25.859885784Z: 	Successfully created Z20443.
INFO: 2026-02-06T05:49:26.073265055Z: 	Successfully created Z20448.
INFO: 2026-02-06T05:49:26.308816533Z: 	Successfully created Z20739.
INFO: 2026-02-06T05:49:26.315603595Z: Updating secondary tables for 102 ZObjects of type Z8
INFO: 2026-02-06T05:49:31.820464962Z: ...site_stats is populated...done.
INFO: 2026-02-06T05:49:31.852357088Z: Checking existence of old default messages...done.
INFO: 2026-02-06T05:49:31.861121688Z: Adding empty categories with description pages...
INFO: 2026-02-06T05:49:31.863045275Z: Removing empty categories without description pages...
INFO: 2026-02-06T05:49:31.864129904Z: Category cleanup complete.
INFO: 2026-02-06T05:49:31.871938346Z: Fixing log entries with log_title starting with 'User:#'
INFO: 2026-02-06T05:49:31.873665650Z: done.
INFO: 2026-02-06T05:49:31.876859184Z: Skipped 34 updates that were already applied.
INFO: 2026-02-06T05:49:31.933521247Z: Purging caches...
INFO: 2026-02-06T05:49:31.933558712Z: Done in 4 min 35 s.
INFO: Creation logs completed. Will check for environment availability now
ERROR: Environment is still not ready. Last seen status is 'starting'
Build step 'Execute shell' marked build as failure

Same error on the daily run today:

https://integration.wikimedia.org/ci/job/wikilambda-catalyst-end-to-end-daily/203/console

the tail:

INFO: 2026-02-11T05:48:30.054779091Z: 	Successfully created Z20424.
INFO: 2026-02-11T05:48:30.184582656Z: 	Successfully created Z20430.
INFO: 2026-02-11T05:48:30.313087829Z: 	Successfully created Z20438.
INFO: 2026-02-11T05:48:30.431027762Z: 	Successfully created Z20443.
INFO: 2026-02-11T05:48:30.581933160Z: 	Successfully created Z20448.
INFO: 2026-02-11T05:48:30.788853992Z: 	Successfully created Z20739.
INFO: 2026-02-11T05:48:30.793440046Z: Updating secondary tables for 102 ZObjects of type Z8
INFO: 2026-02-11T05:48:34.146880420Z: ...site_stats is populated...done.
INFO: 2026-02-11T05:48:34.174032054Z: Checking existence of old default messages...done.
INFO: 2026-02-11T05:48:34.180191520Z: Adding empty categories with description pages...
INFO: 2026-02-11T05:48:34.181169343Z: Removing empty categories without description pages...
INFO: 2026-02-11T05:48:34.181994684Z: Category cleanup complete.
INFO: 2026-02-11T05:48:34.187480924Z: Fixing log entries with log_title starting with 'User:#'
INFO: 2026-02-11T05:48:34.188371223Z: done.
INFO: 2026-02-11T05:48:34.191025623Z: Skipped 34 updates that were already applied.
INFO: 2026-02-11T05:48:34.251748581Z: Purging caches...
INFO: 2026-02-11T05:48:34.251800946Z: Done in 2 min 46 s.
INFO: Creation logs completed. Will check for environment availability now
ERROR: Environment is still not ready. Last seen status is 'starting'
Build step 'Execute shell' marked build as failure

Today's daily run failed with the same error:

https://integration.wikimedia.org/ci/job/wikilambda-catalyst-end-to-end-daily/209/console

INFO: 2026-02-17T05:48:21.022315013Z: Successfully created Z20443.
INFO: 2026-02-17T05:48:21.164523904Z: Successfully created Z20448.
INFO: 2026-02-17T05:48:21.265172786Z: Successfully created Z20739.
INFO: 2026-02-17T05:48:21.267193947Z: Updating secondary tables for 103 ZObjects of type Z8
INFO: 2026-02-17T05:48:24.990848262Z: ...site_stats is populated...done.
INFO: 2026-02-17T05:48:25.013418958Z: Checking existence of old default messages...done.
INFO: 2026-02-17T05:48:25.020234205Z: Adding empty categories with description pages...
INFO: 2026-02-17T05:48:25.021265732Z: Removing empty categories without description pages...
INFO: 2026-02-17T05:48:25.021937891Z: Category cleanup complete.
INFO: 2026-02-17T05:48:25.028408030Z: Fixing log entries with log_title starting with 'User:#'
INFO: 2026-02-17T05:48:25.029157358Z: done.
INFO: 2026-02-17T05:48:25.031608215Z: Skipped 34 updates that were already applied.
INFO: 2026-02-17T05:48:25.083197664Z: Purging caches...
INFO: 2026-02-17T05:48:25.083261949Z: Done in 2 min 38 s.
INFO: Creation logs completed. Will check for environment availability now
ERROR: Environment is still not ready. Last seen status is 'starting'
Build step 'Execute shell' marked build as failure

Hi Abstract Wikipedia team we're going to set some tighter limits on when your environments go away—instead of 3 days(!) we're going to set it to an hour.

Why?

  • Resource limits on our cluster are affecting stability of your tests and Patch Demo

We think this will help with the timeouts you're seeing. Details on T416391.

DSantamaria changed the task status from Open to In Progress.Mar 23 2026, 10:16 AM

Hey! With Vaughn's role in QS-Test-Automation we're trying to tighten down ownership of various spaces. It's still a bit imperfect but this ticket feels more likely to be able to be fixed by either Test Platform or Release Engineering (they are working on Catalyst).

@thcipriani / @AMarkossyan-WMF How do you want to intake this?

Jdforrester-WMF claimed this task.

To the extent that this comes up, I think it's fixed.