Page MenuHomePhabricator

CI jobs failing with various timeouts (March 2025)
Open, HighPublic

Description

So, what follows is entirely speculative and purely anecdotal, but I believe still worth looking into. In the last few days, I've seen CI fail due to various timeouts at a rate way higher than normal. I'm wondering if there's something going on in the CI config/infra that needs to be looked into. Here are the examples I saw in my gerrit bubble. They're not necessarily all related, but to me it does look like we're timing out a lot.

Issues seen thus far:

Some examples (more in comments)

  • 2025-03-06 23:25:04Z: r1124889 (CampaignEvents), build timeout (1h) while running selenium:
23:23:36 [0-0] RUNNING in chrome - /tests/selenium/specs/editEventRegistration.js
23:25:04 [0-0] Error in "Edit Event Registration.can allow organizer to add an additional organizer"
23:25:04 Error: Timeout of 60000ms exceeded. The execution in the test "Edit Event Registration can allow organizer to add an additional organizer" took too long. Try to reduce the run time or increase your timeout for test specs (https://webdriver.io/docs/timeouts). (/workspace/src/extensions/CampaignEvents/tests/selenium/specs/editEventRegistration.js)
23:25:04     at createTimeoutError (/workspace/src/extensions/CampaignEvents/node_modules/mocha/lib/errors.js:498:15)
23:25:04     at Runnable._timeoutError (/workspace/src/extensions/CampaignEvents/node_modules/mocha/lib/runnable.js:429:10)
23:25:04     at Timeout.<anonymous> (/workspace/src/extensions/CampaignEvents/node_modules/mocha/lib/runnable.js:244:24)
23:25:04     at listOnTimeout (node:internal/timers:581:17)
23:25:04     at process.processTimers (node:internal/timers:519:7)
00:22:13 Build timed out (after 60 minutes). Marking the build as failed.
18:52:31 07 03 2025 18:52:31.432:DEBUG [Firefox 128.0 (Linux x86_64)]: Disconnected during run, waiting 2000ms for reconnecting.
18:52:31 07 03 2025 18:52:31.433:DEBUG [Firefox 128.0 (Linux x86_64)]: EXECUTING -> EXECUTING_DISCONNECTED
18:52:33 07 03 2025 18:52:33.433:WARN [Firefox 128.0 (Linux x86_64)]: Disconnected (0 times) reconnect failed before timeout of 2000ms (ping timeout)
19:24:10 [0-6] Error in "Page.should be protectable"
19:24:10 Error: Failed to wait for mediawiki.base to be ready after 5000 ms.
19:24:10     at async Object.waitForModuleState (/workspace/src/tests/selenium/wdio-mediawiki/Util.js:50:3)
19:24:10     at async EditPage.openForEditing (/workspace/src/tests/selenium/pageobjects/edit.page.js:38:3)
19:24:10     at async Context.<anonymous> (/workspace/src/tests/selenium/specs/page.js:142:3)
19:24:16 [0-6] RETRYING in chrome - /tests/selenium/specs/page.js
19:24:17 [0-6] RUNNING in chrome - /tests/selenium/specs/page.js
19:25:42 [0-6] Error in "Page.should be protectable"
19:25:42 Error: Failed to wait for mediawiki.base to be ready after 5000 ms.
19:25:42     at async Object.waitForModuleState (/workspace/src/tests/selenium/wdio-mediawiki/Util.js:50:3)
19:25:42     at async EditPage.openForEditing (/workspace/src/tests/selenium/pageobjects/edit.page.js:38:3)
19:25:42     at async Context.<anonymous> (/workspace/src/tests/selenium/specs/page.js:142:3)
19:25:48 [0-6] FAILED in chrome - /tests/selenium/specs/page.js (1 retries)
09:46:03 10 03 2025 09:46:03.147:DEBUG [Firefox 115.0 (Linux x86_64)]: Disconnected during run, waiting 2000ms for reconnecting.
09:46:03 10 03 2025 09:46:03.147:DEBUG [Firefox 115.0 (Linux x86_64)]: EXECUTING -> EXECUTING_DISCONNECTED
09:46:04 10 03 2025 09:46:04.129:DEBUG [middleware:source-files]: Requesting /null
09:46:04 10 03 2025 09:46:04.130:DEBUG [middleware:source-files]: Fetching /null
09:46:04 10 03 2025 09:46:04.130:DEBUG [proxy]: proxying request - /null to 127.0.0.1:9413
09:46:05 10 03 2025 09:46:05.148:WARN [Firefox 115.0 (Linux x86_64)]: Disconnected (0 times) reconnect failed before timeout of 2000ms (ping timeout)
10:50:03   POST /campaignevents/v0/event_registration
10:50:08     1) "before all" hook in "POST /campaignevents/v0/event_registration"
[...]
10:50:11   1) POST /campaignevents/v0/event_registration
10:50:11        "before all" hook in "POST /campaignevents/v0/event_registration":
10:50:11      Error: Timeout of 5000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/workspace/src/extensions/CampaignEvents/tests/api-testing/EnableRegistration.js)
10:50:11       at listOnTimeout (node:internal/timers:581:17)
10:50:11       at process.processTimers (node:internal/timers:519:7)

Related Objects

Event Timeline

2025-03-12 14:25:35Z: r1127021 (CampaignEvents), mediawiki-quibble-apitests-vendor-php74, timeout in API tests:

14:25:35   POST /campaignevents/v0/event_registration
14:25:40     1) "before all" hook in "POST /campaignevents/v0/event_registration"
[...]
14:25:42   1) POST /campaignevents/v0/event_registration
14:25:42        "before all" hook in "POST /campaignevents/v0/event_registration":
14:25:42      Error: Timeout of 5000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/workspace/src/extensions/CampaignEvents/tests/api-testing/EnableRegistration.js)
14:25:42       at listOnTimeout (node:internal/timers:581:17)
14:25:42       at process.processTimers (node:internal/timers:519:7)

2025-03-17 14:45:29Z: r1128434 (CampaignEvents), mediawiki-quibble-apitests-vendor-php74, same timeout as T388416#10629519 and the task description.

This is quickly ceasing to be speculative and at the same time becoming quite irritating.

3 out of 3 merged patches in CampaignEvents today failed due timeouts:

14:25:08   mediawiki.deflate
14:25:08     ✔ deflate [foobar]
14:25:08     ✔ deflate [Unicode]
14:25:08     ✔ deflate [Non BMP unicode]
14:25:13 18 03 2025 14:25:13.247:DEBUG [Firefox 128.0 (Linux x86_64)]: Disconnected during run, waiting 2000ms for reconnecting.
14:25:13 18 03 2025 14:25:13.247:DEBUG [Firefox 128.0 (Linux x86_64)]: EXECUTING -> EXECUTING_DISCONNECTED
14:25:15 18 03 2025 14:25:15.248:WARN [Firefox 128.0 (Linux x86_64)]: Disconnected (0 times) reconnect failed before timeout of 2000ms (ping timeout)
14:25:15 Firefox 128.0 (Linux x86_64) ERROR
14:25:15   Disconnected reconnect failed before timeout of 2000ms (ping timeout)
14:57:38 Execution of 3 workers started at 2025-03-18T14:57:38.189Z
14:57:38 
14:57:39 [0-2] RUNNING in chrome - /tests/selenium/specs/watchstar.js
14:57:39 [0-0] RUNNING in chrome - /tests/selenium/specs/mainmenu_loggedin.js
14:57:39 [0-1] RUNNING in chrome - /tests/selenium/specs/references.js
14:57:52 [0-2] PASSED in chrome - /tests/selenium/specs/watchstar.js
14:57:54 [0-0] Nearby item will only appear in main menu if $wgMFNearby is configured
14:57:54 [0-0] PASSED in chrome - /tests/selenium/specs/mainmenu_loggedin.js
15:30:11 Build timed out (after 60 minutes). Marking the build as failed.
15:30:11 Build was aborted

Just happened again in gate-and-submit for r1127021, https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/3807/console:

18:51:54 [0-6] RETRYING in chrome - /tests/selenium/specs/page.js
18:51:55 [0-6] RUNNING in chrome - /tests/selenium/specs/page.js
18:52:12 [0-8] PASSED in chrome - /tests/selenium/specs/user.js
18:52:42 [0-6] FAILED in chrome - /tests/selenium/specs/page.js (1 retries)
[...]
18:52:42 [Chrome 120.0.0.0 linux #0-6] » /tests/selenium/specs/page.js
18:52:42 [Chrome 120.0.0.0 linux #0-6] Page
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✓ should be previewable @daily
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✓ should be creatable
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ? should be re-creatable
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✖ "before each" hook for Page
18:52:42 [Chrome 120.0.0.0 linux #0-6]
18:52:42 [Chrome 120.0.0.0 linux #0-6] 2 passing (26.9s)
18:52:42 [Chrome 120.0.0.0 linux #0-6] 1 failing
18:52:42 [Chrome 120.0.0.0 linux #0-6]
18:52:42 [Chrome 120.0.0.0 linux #0-6] 1) Page "before each" hook for Page
18:52:42 [Chrome 120.0.0.0 linux #0-6] Failed to wait for mediawiki.base to be ready after 5000 ms.
[...]
18:52:42 [Chrome 120.0.0.0 linux #0-6] » /tests/selenium/specs/page.js
18:52:42 [Chrome 120.0.0.0 linux #0-6] Page
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✓ should be previewable @daily
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✓ should be creatable
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✓ should be re-creatable
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ? should be editable @daily
18:52:42 [Chrome 120.0.0.0 linux #0-6]    ✖ "before each" hook for Page
18:52:42 [Chrome 120.0.0.0 linux #0-6]
18:52:42 [Chrome 120.0.0.0 linux #0-6] 3 passing (45.4s)
18:52:42 [Chrome 120.0.0.0 linux #0-6] 1 failing
18:52:42 [Chrome 120.0.0.0 linux #0-6]
18:52:42 [Chrome 120.0.0.0 linux #0-6] 1) Page "before each" hook for Page
18:52:42 [Chrome 120.0.0.0 linux #0-6] Failed to wait for mediawiki.base to be ready after 5000 ms.

So, the "Failed to wait for mediawiki.base" seems to be known and seen before. However, the api-testing one is the one I've seen the most times, and possibly never before this bug report. There's also the question of why selenium sometimes seems to get stuck until the build times out after 1 hour.

I briefly looked at MW logs for the affected jobs but didn't see anything obvious. I'm not even sure where to look, though. At this point it's not even clear if it's MW performance that got worse, or CI resource availability, or something else entirely.

I spot-checked the VM load for T388416#10648479 (integration-agent-docker-1051) and the 1h timeout in T388416#10648413 (-1057), but didn't see anything obviously out of place.

Daimona triaged this task as Unbreak Now! priority.Mar 20 2025, 2:17 PM

Okay, current status: there are 53 patches in gate-and-submit. The first patch in queue, r1129578 (for MinervaNeue), has been enqueued for 1h37m and is still running tests; not ideal, but we're used to it. Then we have 8 patches after that whose jobs have ALL completed successfully, so these patches will be merged as soon as the Minerva one finishes. And then we have the rest of the queue. Alright, let's see why the Minerva patch is taking this long. Its only job left is wmf-quibble-selenium-php81. Checking the console:

13:38:46 Execution of 2 workers started at 2025-03-20T13:38:46.494Z
13:38:46 
13:38:46 Setting up modified /workspace/src/LocalSettings.php
13:38:46 Restarting php8.1-fpm
13:38:48 [0-1] RUNNING in chrome - /tests/selenium/specs/ipcontributions.js
13:38:49 [0-0] RUNNING in chrome - /tests/selenium/specs/contributions.js
13:39:23 [0-1] PASSED in chrome - /tests/selenium/specs/ipcontributions.js
13:40:37 [0-0] Error in "IPInfo on Special:Contributions.should show geo data for temp user with edits if agreement was already accepted"
13:40:37 Error: Timeout of 60000ms exceeded. The execution in the test "IPInfo on Special:Contributions should show geo data for temp user with edits if agreement was already accepted" took too long. Try to reduce the run time or increase your timeout for test specs (https://webdriver.io/docs/timeouts). (/workspace/src/extensions/IPInfo/tests/selenium/specs/contributions.js)
13:40:37     at createTimeoutError (/workspace/src/extensions/IPInfo/node_modules/mocha/lib/errors.js:498:15)
13:40:37     at Runnable._timeoutError (/workspace/src/extensions/IPInfo/node_modules/mocha/lib/runnable.js:429:10)
13:40:37     at Timeout.<anonymous> (/workspace/src/extensions/IPInfo/node_modules/mocha/lib/runnable.js:244:24)
13:40:37     at listOnTimeout (node:internal/timers:581:17)
13:40:37     at process.processTimers (node:internal/timers:519:7)

For clarity/posterity, it is currently 14:16 UTC. So, the entire queue of 50+ patches is blocked on a job that failed 40 minutes ago and is waiting for absolutely nothing to happen. I hope we can all agree that this needs to be looked into...

so these patches will be merged as soon as the Minerva one finishes

To be clear, they merge iff the parent-for-CI-purposes MinervaNeue patch also passes and is merged, because CI doesn't know if e.g. the Minerva patch is the only reason why they pass. If CI for the parent patch were to fail (as indeed it did in this case), their CI result is discarded and they start again, making things even slower.

Change #1129873 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] [DNM] Attempt to reproduce stuck selenium jobs

https://gerrit.wikimedia.org/r/1129873

so these patches will be merged as soon as the Minerva one finishes

To be clear, they merge iff the parent-for-CI-purposes MinervaNeue patch also passes and is merged, because CI doesn't know if e.g. the Minerva patch is the only reason why they pass. If CI for the parent patch were to fail (as indeed it did in this case), their CI result is discarded and they start again, making things even slower.

Yeah, indeed... Which is why stuff is taking ~3 hours to merge right now. But lo and behold, another job got stuck: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4492/console

Preliminary findings from @dancy: at 16:27Z while the job was stuck, there were ffmpeg processes running, e.g.

ffmpeg -f x11grab -video_size 1280x1024 -i :94 -loglevel error -y -pix_fmt yuv420p /workspace/log/Verify-checkuser-can-make-checks%3A-Should-be-able-to-run-'Get-actions'-check-2025-03-20T15-49-34-031Z.mp4

And the last lines in the console output for the job were:

15:49:34 [0-0] Error in "CheckUser.With CheckUser user group.Verify checkuser can make checks:.Should be able to run 'Get IPs' check"
15:49:34 Error: Timeout of 60000ms exceeded. The execution in the test "Verify checkuser can make checks: Should be able to run 'Get IPs' check" took too long. Try to reduce the run time or increase your timeout for test specs (https://webdriver.io/docs/timeouts). (/workspace/src/extensions/CheckUser/tests/selenium/specs/checkuser.js)
15:49:34     at createTimeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/errors.js:498:15)
15:49:34     at Runnable._timeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:429:10)
15:49:34     at Timeout.<anonymous> (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:244:24)
15:49:34     at listOnTimeout (node:internal/timers:581:17)
15:49:34     at process.processTimers (node:internal/timers:519:7)
15:50:34 [0-0] Error in "CheckUser.With CheckUser user group.Verify checkuser can make checks:.Should be able to run 'Get actions' check"
15:50:34 Error: Timeout of 60000ms exceeded. The execution in the test "Verify checkuser can make checks: Should be able to run 'Get actions' check" took too long. Try to reduce the run time or increase your timeout for test specs (https://webdriver.io/docs/timeouts). (/workspace/src/extensions/CheckUser/tests/selenium/specs/checkuser.js)
15:50:34     at createTimeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/errors.js:498:15)
15:50:34     at Runnable._timeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:429:10)
15:50:34     at Timeout.<anonymous> (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:244:24)
15:50:34     at listOnTimeout (node:internal/timers:581:17)
15:50:34     at process.processTimers (node:internal/timers:519:7)
15:51:34 [0-0] Error in "CheckUser.With CheckUser user group.Verify checkuser can make checks:.Should be able to run 'Get users' check"
15:51:34 Error: Timeout of 60000ms exceeded. The execution in the test "Verify checkuser can make checks: Should be able to run 'Get users' check" took too long. Try to reduce the run time or increase your timeout for test specs (https://webdriver.io/docs/timeouts). (/workspace/src/extensions/CheckUser/tests/selenium/specs/checkuser.js)
15:51:34     at createTimeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/errors.js:498:15)
15:51:34     at Runnable._timeoutError (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:429:10)
15:51:34     at Timeout.<anonymous> (/workspace/src/extensions/CheckUser/node_modules/mocha/lib/runnable.js:244:24)
15:51:34     at listOnTimeout (node:internal/timers:581:17)
15:51:34     at process.processTimers (node:internal/timers:519:7)

So, this specific issue of builds timing out after 1 hours when a selenium test times out might be related to the selenium runner being unable to terminate ffmpeg. Manually terminating the ffmpeg process did not have any effect though. So it could also be something else that keeps the test "running" after the timeout.

I'll take a closer look. Obviously though, there's also the question of why the test is timing out in the first place, i.e., why are we seeing so many timeouts. But for now I'd like to at least fix the stuck jobs situation.

Also, snapshot taken before the build timed out:

1UID PID PPID C STIME TTY TIME CMD
2nobody 1 0 0 15:40 ? 00:00:00 /sbin/docker-init -- quibble-with-supervisord --reporting-url=https://earlywarningbot.toolforge.org --packages-source vendor --db mysql --db-dir /workspace/db --git-parallel=8 --memcached-server=integration-castor05.integration.eqiad1.wikimedia.cloud:11211 --success-cache-key-data=wmf-quibble-selenium-php81 --success-cache-key-data=docker-registry.wikimedia.org/releng/quibble-bullseye-php81:1.13.0-s1 --reporting-url=https://earlywarningbot.toolforge.org --run selenium
3nobody 7 1 0 15:40 ? 00:00:04 /usr/bin/python3 /usr/local/bin/quibble --web-backend=external --web-url=http://127.0.0.1:9413 --reporting-url=https://earlywarningbot.toolforge.org --packages-source vendor --db mysql --db-dir /workspace/db --git-parallel=8 --memcached-server=integration-castor05.integration.eqiad1.wikimedia.cloud:11211 --success-cache-key-data=wmf-quibble-selenium-php81 --success-cache-key-data=docker-registry.wikimedia.org/releng/quibble-bullseye-php81:1.13.0-s1 --reporting-url=https://earlywarningbot.toolforge.org --run selenium
4nobody 9 1 0 15:40 ? 00:00:00 /usr/bin/python3 /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
5nobody 55 9 0 15:40 ? 00:00:00 /bin/sh /usr/sbin/apache2ctl -DFOREGROUND
6nobody 56 9 0 15:40 ? 00:00:05 memcached
7nobody 78 55 0 15:40 ? 00:00:00 /usr/sbin/apache2 -DFOREGROUND
8nobody 87 78 0 15:40 ? 00:00:00 /usr/sbin/apache2 -DFOREGROUND
9nobody 88 78 0 15:40 ? 00:00:00 /usr/sbin/apache2 -DFOREGROUND
10nobody 1966 7 0 15:41 ? 00:00:00 git cat-file --batch-check
11nobody 2696 7 0 15:42 ? 00:00:00 git cat-file --batch-check
12nobody 2697 7 0 15:42 ? 00:00:00 git cat-file --batch
13nobody 2734 7 0 15:42 ? 00:00:00 git cat-file --batch-check
14nobody 2735 7 0 15:42 ? 00:00:00 git cat-file --batch
15nobody 2772 7 0 15:42 ? 00:00:00 git cat-file --batch-check
16nobody 2773 7 0 15:42 ? 00:00:00 git cat-file --batch
17nobody 3220 7 0 15:42 ? 00:00:08 /usr/sbin/mysqld --skip-networking --innodb-print-all-deadlocks --datadir=/workspace/db/quibble-mysql-gshw2w8d --log-error=/workspace/log/mysql-error.log --pid-file=/workspace/db/quibble-mysql-gshw2w8d/mysqld.pid --socket=/workspace/db/quibble-mysql-gshw2w8d/socket
18nobody 3307 7 11 15:42 ? 00:06:27 Xvfb :94 -screen 0 1280x1024x24 -nolisten tcp -nolisten unix
19nobody 3308 7 0 15:42 ? 00:00:00 chromedriver --port=4444 --url-base=/wd/hub
20nobody 8224 7 0 15:47 ? 00:00:00 npm run selenium-test
21nobody 8235 8224 0 15:47 ? 00:00:00 sh -c wdio tests/selenium/wdio.conf.js
22nobody 8236 8235 0 15:47 ? 00:00:01 node /workspace/src/extensions/CheckUser/node_modules/.bin/wdio tests/selenium/wdio.conf.js
23nobody 8274 1 0 15:47 ? 00:00:00 php-fpm: master process (/etc/php/8.1/fpm/php-fpm.conf)
24nobody 8279 8274 0 15:47 ? 00:00:03 php-fpm: pool www
25nobody 8280 8274 0 15:47 ? 00:00:03 php-fpm: pool www
26nobody 8282 8274 0 15:47 ? 00:00:02 php-fpm: pool www
27nobody 8283 8274 0 15:47 ? 00:00:02 php-fpm: pool www
28nobody 8284 8236 0 15:47 ? 00:00:01 /usr/bin/node --no-wasm-code-gc /workspace/src/extensions/CheckUser/node_modules/@wdio/local-runner/build/run.js tests/selenium/wdio.conf.js
29nobody 9021 8274 0 15:47 ? 00:00:03 php-fpm: pool www
30nobody 9092 8274 0 15:48 ? 00:00:03 php-fpm: pool www
31nobody 9093 8274 0 15:48 ? 00:00:03 php-fpm: pool www
32nobody 9103 8274 0 15:48 ? 00:00:02 php-fpm: pool www
33nobody 9377 8274 0 15:48 ? 00:00:02 php-fpm: pool www
34nobody 9923 8274 0 15:49 ? 00:00:00 php-fpm: pool www
35nobody 10049 0 0 16:40 ? 00:00:00 ps -ef

Daimona lowered the priority of this task from Unbreak Now! to High.Mar 20 2025, 7:14 PM

I will continue investigating the stuck jobs in T389536, which I marked as UBN. Lowering the priority of this one. Still high because the tests should not reach the 60 seconds timeout in the first place, but the #1 priority now is to make sure that those timeouts do not waste us 30+ minutes of CI time.

Seen a new one today: 2025-03-21 14:56:39Z r1130127 (CampaignEvents) https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4857/console

14:56:33   Community Configuration Example Page
14:56:39     ✓ should save configuration changes and verify them on the example page (6450ms)
14:56:39     Form elements and basic functionality
14:57:01       1) should have all expected form elements and labels
14:57:02       ✓ should have a save button (1137ms)
14:57:03       ✓ should have a disabled save button for logged-out users (872ms)
14:57:04       ✓ should update a simple string via API and verify the update on the form (1094ms)
14:57:04 
14:57:04 
14:57:04   4 passing (31s)
14:57:04   1 failing
14:57:04 
14:57:04   1) Community Configuration Example Page
14:57:04        Form elements and basic functionality
14:57:04          should have all expected form elements and labels:
14:57:04      AssertionError: Timed out retrying after 20000ms: Expected to find element: `#CCExample_String`, but never found it.
14:57:04       at eval (webpack://community-configuration/./cypress/e2e/ccexample.cy.ts:13:33)
14:57:04   at Array.forEach (<anonymous>)
14:57:04       at Context.eval (webpack://community-configuration/./cypress/e2e/ccexample.cy.ts:12:0)
hashar subscribed.

We would need a stacktrace of some sort? I have filed that has a sub task if one is interested in finding how to take a stacktrace: T390125

We would need a stacktrace of some sort? I have filed that has a sub task if one is interested in finding how to take a stacktrace: T390125

For the future, definitely. For this specific task, I was thinking that maybe we could call it done once T389536 is resolved. With all the changes we made, also the added CI agents, maybe these issues have been resolved. And we can always create new tasks otherwise. T380061 being an exception, but I could not figure that out.

For this specific task, I was thinking that maybe we could call it done once T389536 is resolved

Yes I agree. I have only filed T390125 for the future :)

Hey folks!

QS-Test-Automation is where I'm putting tickets that could benefit from attention from a member of the QS team who understands automation best practices - or simply some dedicated attention. Not the platform (ie hardware or underlying tech powering test automation) but also not feature-specific.

For QS-Test-Automation team:

If we find that the timeout error is due to something with the test automation code (versus true slowness and lag on the system), we can help determine other ways to address those, we can discuss it as a group. Let's file specific tickets for the specific failing tests to do that work and groom them.

I think this task can now serve as just a tracking task, as all the important timeouts have been filed as subtasks. I suppose the question is whether a tracking task like this is useful (also considering the fact that more timeouts exist out there and aren't listed here, and that "march 2025" is a while ago)

That makes sense. And yeah, timeouts are definitely still a thing. I'm curious how many of them are caused by truly slow performance vs something more simple like a missing/incorrect selector. If it's the latter, QS-Test-Automation could help.