Page MenuHomePhabricator

Catch uncaught exceptions and rejections for webdriver.io tests
Closed, ResolvedPublic

Description

In T389536 there's example of when tests timeout after 60 seconds and then nothing happens until the CI kills the job. I'm thinking that because of an uncaught exception or promise rejection that we or WebDriver.io do not catch.

We should catch those and log them. At least then we will know if we have these kind of uncaught or not.

Event Timeline

Change #1129948 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[mediawiki/core@master] Catch uncaught errors and rejections.

https://gerrit.wikimedia.org/r/1129948

Change #1129956 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/skins/MinervaNeue@master] [DNM] Test selenium oddities

https://gerrit.wikimedia.org/r/1129956

Change #1129956 abandoned by Daimona Eaytoy:

[mediawiki/skins/MinervaNeue@master] [DNM] Test selenium oddities

Reason:

MinervaNeue tests should be in a better state now, so this patch is no longer needed for the time being.

https://gerrit.wikimedia.org/r/1129956

Change #1129948 merged by jenkins-bot:

[mediawiki/core@master] Catch uncaught errors and rejections.

https://gerrit.wikimedia.org/r/1129948

Change #1131716 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@master] wdio-mediawiki: Release 2.7.1

https://gerrit.wikimedia.org/r/1131716

Please report this to upstream wdio as well. It seems very unlikely to me that this is the normal way it is mean to work. I suspect there's something we are doing to actively prevent the process from exiting, in which case there may be other problems as well caused by the same cause.

Example:

function example() {
  return Promise.reject(new Error('Ohnosecond'));
}

const x = example();

setTimeout(function() {
  console.log('Still alive');
}, 5000);
$ node tmp.js
/Users/krinkle/Temp/tmp.js:2
  return Promise.reject(new Error('Ohnosecond'));
                        ^

Error: Ohnosecond
    at example (/Users/krinkle/Temp/tmp2.js:2:25)
    at Object.<anonymous> (/Users/krinkle/Temp/tmp2.js:5:11)
    at node:internal/main/run_main_module:33:47

Node.js v23.6.0

[exit-code=1] [runtime=0s] $ 

By default, this exits immediately given that:

  • Since Node.js 15, the default behaviour is --unhandled-rejections=strict which means process.unhandledException handles both uncaught errors and unhandled rejections.
  • The default process.unhandledException handler is to print a stack trace and exit the process immediately. https://nodejs.org/api/process.html#event-unhandledrejection

Changed default mode to throw. Previously, a warning was emitted.

By default, Node.js handles such exceptions by printing the stack trace to stderr and exiting with code 1, overriding any previously set process.exitCode.

This means for this not to happen, we or Wdio must have broken that.

Change #1131716 merged by jenkins-bot:

[mediawiki/core@master] wdio-mediawiki: Release 2.7.1

https://gerrit.wikimedia.org/r/1131716

Please report this to upstream wdio as well. It seems very unlikely to me that this is the normal way it is mean to work. I suspect there's something we are doing to actively prevent the process from exiting, in which case there may be other problems as well caused by the same cause.

There definitely seems to be something. But the thing is, we're running 2 major versions behind, so I don't think we're really in the position for a bug report... But also, judging from T389536#10676487, it seems that unhandled rejections are not what makes the process hang. Or at least not the only thing. It's not clear what it is, though...

With those catches I wanted to understand if it happens on our side or in webdriver.io.

I think one of the problems is that we are behind on major versions of webdriver.io. 8 is LTS and we are on 7. When we are on the same version it will be easier to report upstream (when we understand where's the problem).

There's also the problem with the CI where the jobs times out after 60 minutes, where the logs on the server says Chromium crashes and webdriver.io do not catch it. I tried to reproduce that locally with --crash-test but then webdriver.io catches the crash. I'll report that too if it continues to happen.