Page MenuHomePhabricator

Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins
Closed, ResolvedPublic

Description

For some reason, the quibble-vendor-mysql-hhvm-docker test is constantly failing for unrelated changes of EventBus Gerrit 490141. What's more interesting is the fact that every time a new PS is uploaded, the test fails for a different reason, even though nothing in the code of this CR relates to the failures. Some example failed runs:

Event Timeline

mobrovac triaged this task as High priority.Feb 13 2019, 8:36 PM
mobrovac created this task.
Restricted Application added a project: Analytics. · View Herald TranscriptFeb 13 2019, 8:36 PM
Pchelolo added a subscriber: Pchelolo.

This is getting in our way since we can't merge an otherwise ready patchset

EventBus itself seems to be fine since a change got merged yesterday by CI https://gerrit.wikimedia.org/r/#/c/491591/

So the issue is most probably within the proposed patchset: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/490141/ Note that it got uploaded when WMCS had some issue which might have explained a few of the issue. However a new patchset got send yesterday and still fail. Specially the selenium tests from mediawiki.

Notably at some point it is not able to login:

[20:21:20] [E] [MWBOT] Login failed: WikiAdmin@http://127.0.0.1:9412/

00:02:24.244 [chrome #0-1] 1) Page should be re-creatable:
00:02:24.244 [chrome #0-1] invalidjson: No valid JSON response
00:02:24.245 [chrome #0-1] Error: invalidjson: No valid JSON response

00:02:24.250 [chrome #0-1] 2) Page should have history @daily:
00:02:24.250 [chrome #0-1] Could not login: WrongToken
00:02:24.250 [chrome #0-1] Error: Could not login: WrongToken

:/

I have tried to resubmit the change: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/491014/ with no luck.

We have quite a few other patches in development for the extension and they all pass Jenkins tests. I'm not sure what exactly is different in this particular change..

The tests are being run with Quibble which should let us reproduce the failure. Specially if one reuse the Docker container.

It has some documentation at https://doc.wikimedia.org/quibble/#docker-container

I might try to reproduce tomorrow.

Antoine, I tried to recreate the test results locally using the instructions in the link you shared but it failed (see below). Anything I am doing wrong?

docker run -it --rm \                                                                                        ✔  1188  18:55:27  
  -v "$(pwd)"/cache:/cache \
  -v "$(pwd)"/log:/workspace/log \
  -v "$(pwd)"/ref:/srv/git:ro \
  -v "$(pwd)"/src:/workspace/src \
  docker-registry.wikimedia.org/releng/quibble-stretch:latest
DEBUG:quibble.cmd:Running stages: phpunit, npm-test, composer-test, qunit, selenium
WARNING:quibble.cmd:ZUUL_PROJECT not set. Assuming mediawiki/core
INFO:quibble.cmd:Adding mediawiki/vendor
INFO:quibble.cmd:Projects: mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor
INFO:zuul.CloneMapper:Workspace path set to: /workspace/src
INFO:zuul.CloneMapper:Mapping projects to workspace...
INFO:zuul.CloneMapper:  mediawiki/core -> /workspace/src
INFO:zuul.CloneMapper:  mediawiki/skins/Vector -> /workspace/src/skins/Vector
INFO:zuul.CloneMapper:  mediawiki/vendor -> /workspace/src/vendor
DEBUG:zuul.CloneMapper:Checking overlap in destination directories...
INFO:zuul.CloneMapper:Expansion completed.
INFO:zuul.Cloner:Preparing 3 repositories
INFO:zuul.Cloner:Creating repo mediawiki/core from cache /srv/git/mediawiki/core.git
INFO:zuul.Cloner:Updating origin remote in repo mediawiki/core to https://gerrit.wikimedia.org/r/p/mediawiki/core
DEBUG:zuul.Repo:Resetting repository /workspace/src
DEBUG:zuul.Repo:Updating repository /workspace/src
INFO:zuul.Cloner:Falling back to branch master
DEBUG:zuul.Repo:Checking out remotes/origin/master
INFO:zuul.Cloner:Prepared mediawiki/core repo with branch master at commit 0601395efe518d960575e2b1a5fa4cc37cd3ca80
INFO:zuul.Cloner:Creating repo mediawiki/skins/Vector from cache /srv/git/mediawiki/skins/Vector.git
INFO:zuul.Cloner:Updating origin remote in repo mediawiki/skins/Vector to https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector
DEBUG:zuul.Repo:Resetting repository /workspace/src/skins/Vector
DEBUG:zuul.Repo:Updating repository /workspace/src/skins/Vector
INFO:zuul.Cloner:Falling back to branch master
DEBUG:zuul.Repo:Checking out remotes/origin/master
INFO:zuul.Cloner:Prepared mediawiki/skins/Vector repo with branch master at commit 36e5af4cba8dbaff62f793d2f575e960ca74aed0
INFO:zuul.Cloner:Creating repo mediawiki/vendor from cache /srv/git/mediawiki/vendor.git
INFO:zuul.Cloner:Updating origin remote in repo mediawiki/vendor to https://gerrit.wikimedia.org/r/p/mediawiki/vendor
DEBUG:zuul.Repo:Resetting repository /workspace/src/vendor
DEBUG:zuul.Repo:Updating repository /workspace/src/vendor
INFO:zuul.Cloner:Falling back to branch master
DEBUG:zuul.Repo:Checking out remotes/origin/master
INFO:zuul.Cloner:Prepared mediawiki/vendor repo with branch master at commit 287b0dcc1f2472e74a2b7119f6528a9559db7983
INFO:zuul.Cloner:Prepared all repositories
INFO:quibble.cmd:Updating git submodules of extensions and skins
Traceback (most recent call last):
  File "/usr/local/bin/quibble", line 11, in <module>
    load_entry_point('quibble==0.0.0', 'console_scripts', 'quibble')()
  File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 558, in main
    cmd.execute()
  File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 455, in execute
    self.mw_install()
  File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 270, in mw_install
    db = dbclass(base_dir=self.db_dir, dump_dir=self.dump_dir)
  File "/usr/local/lib/python3.5/dist-packages/quibble/backend.py", line 207, in __init__
    if php_is_hhvm():
  File "/usr/local/lib/python3.5/dist-packages/quibble/__init__.py", line 80, in php_is_hhvm
    return b'HipHop' in subprocess.check_output(['php', '--version'])
  File "/usr/lib/python3.5/subprocess.py", line 316, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.5/subprocess.py", line 383, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'php'

My bad sorry. The container releng/quibble-stretch no more ship with PHP since:

ed2233f0 - Turn quibble-stretch into a PHP version agnostic image (5 months ago)

Thus it only serves as a base image for various PHP flavors:

docker-registry.wikimedia.org/releng/quibble-stretch-hhvm:latest
docker-registry.wikimedia.org/releng/quibble-stretch-php70:latest
docker-registry.wikimedia.org/releng/quibble-stretch-php71:latest
docker-registry.wikimedia.org/releng/quibble-stretch-php72:latest
docker-registry.wikimedia.org/releng/quibble-stretch-php73:latest

docker-registry.wikimedia.org/releng/quibble-stretch-php70:latest would do.

Reproduced on my Debian Stretch with the command line:

DISPLAY='' \
ZUUL_URL=https://gerrit.wikimedia.org/r/p \
ZUUL_PROJECT=mediawiki/extensions/EventBus \
ZUUL_BRANCH=master \
ZUUL_REF=refs/changes/41/490141/7 \
  quibble \
    --packages-source vendor \
    --db sqlite --git-cache /home/hashar/projects \
    --workspace "/home/hashar/workspace" \
   --run selenium

(for next runs, additional pass: --skip-zuul --skip-deps to skip updating repositories and to skip composer/npm install)

The first Mediawiki tests (BlankPage) pass:

[chrome #0-0]   BlankPage
[chrome #0-0]       ✓ should have its title @daily
[chrome #0-0]
[chrome #0-0]
[chrome #0-0] 1 passing (1s)

The first two Page tests pass ("Page should be previewable" and "Page should be creatable"). Then there is a login error of some sort. Finally, there is UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'currentRetry' of undefined.

Also note that ffmpeg (which records a video) ends up being killed and reports a null signal (which is wrong?!).

If I reset EventBus to latest master. The tests pass just fine. So it is definitely https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/490141/ introducing a fault of some sort :(

Milimetric moved this task from Incoming to Radar on the Analytics board.Feb 21 2019, 5:45 PM
hashar added a comment.EditedFeb 21 2019, 6:58 PM

Found it.

The Selenium test suite uses Mocha and eventually for the Page test it dies with:

(node:1165) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 4): TypeError: Cannot read property 'currentRetry' of undefined
00:01:36.689             if (!suiteStats.hooks[uid]) throw Error(`Unrecognised hook [${runner.title}] for suite [${runner.parent}]`);
00:01:36.689 Error: Unrecognised hook ["after each" hook for "should be deletable"] for suite [Page]
00:01:36.704 (node:1165) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 5): Error: channel closed

And exit 1. That leaves no chances for the spec reporter to output a stacktace for the failing. The report for each test is only output after the suite has completed, but since the suite is aborted there is no report.

On a test failure, the test object has a message and a stack trace. Using *cough* console.log(), I eventually got:

Running afterTest hook for  { type: 'afterTest',
  err: 
   { message: 'internal_api_error_Error: [235a68ee4bfb54ac29f1eb26] Exception caught: Call to undefined method EventBus::createPerformerAttrs()',
     stack: 'Error: internal_api_error_Error: [235a68ee4bfb54ac29f1eb26] Exception caught: Call to undefined method EventBus::createPerformerAttrs()\n    at rawRequest.then (node_modules/mwbot/src/index.js:257:31)\n    at tryCatcher (node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:512:31)\n    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:569:18)\n    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:614:10)\n    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:694:18)\n    at _drainQueueStep (node_modules/bluebird/js/release/async.js:138:12)\n    at _drainQueue (node_modules/bluebird/js/release/async.js:131:9)\n    at Async._drainQueues (node_modules/bluebird/js/release/async.js:147:5)\n    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)\n    at runCallback (timers.js:794:20)\n    at tryOnImmediate (timers.js:752:5)\n    at processImmediate [as _immediateCallback] (timers.js:729:5)',
     type: 'Error',
     expected: undefined,
     actual: undefined },
  title: 'should be re-creatable',
  parent: 'Page',
  fullTitle: 'Page should be re-creatable',
  pending: false,
  file: '/home/hashar/workspace/src/tests/selenium/specs/page.js',
  currentTest: 'should be re-creatable',
  passed: false,
  duration: 1127 }

internal_api_error_Error: [235a68ee4bfb54ac29f1eb26] Exception caught: Call to undefined method EventBus::createPerformerAttrs()
Error: internal_api_error_Error: [235a68ee4bfb54ac29f1eb26] Exception caught: Call to undefined method EventBus::createPerformerAttrs()
    at rawRequest.then (node_modules/mwbot/src/index.js:257:31)
    at tryCatcher (node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (node_modules/bluebird/js/release/promise.js:694:18)
    at _drainQueueStep (node_modules/bluebird/js/release/async.js:138:12)
    at _drainQueue (node_modules/bluebird/js/release/async.js:131:9)
    at Async._drainQueues (node_modules/bluebird/js/release/async.js:147:5)
    at Immediate.Async.drainQueues (node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)

So really what we are looking for is:

'Error: internal_api_error_Error:
[235a68ee4bfb54ac29f1eb26] Exception caught: Call to undefined method EventBus::createPerformerAttrs()

From the logs of one of the failing build ( https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-docker/6090/artifact/log/ ), mw-exception.log has it as well:

2019-02-21 10:07:29 7a0726019460 wikidb: [a1deb45ce24c6cb1769ff261] /api.php?format=json
Error from line 109 of /workspace/src/extensions/EventBus/includes/EventBusHooks.php: Call to undefined method EventBus::createPerformerAttrs()

Looking at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/490141/7 the method EventBus::createPerformerAttrs() has been moved to EventFactory::createPerformerAttrs() but it is still statically called with the old class.

Thank you a lot @hashar . I guess this ticket can be closed, however I have a last question - do you think we could somehow improve quibble to output this info without the need to place *cough* console.log()?

Definitely, the test suite is broken. Quibble is fine itself, it just run commands. The issue is somewhere in the webdriver.io material of mediawiki/core :(

It is similar to T198125: MediaWiki webdriver afterTest hook should catch mwbot issues/exceptions somehow. @mobrovac made the same assumption on T198125#4949749 , need a task to be filled ;)

The normal behavior is that once all test of a suite have completed, webdriver-spec-reporter craft a report that has the error message as well as the stacktrace. That would have made it obvious. The reason for the exit is left to be figured out. Probably as a follow up task and an easier way to reproduce ;)

hashar closed this task as Resolved.EditedFeb 22 2019, 12:57 PM
hashar claimed this task.

Root cause figured out. I have filled another task to cover the weird webdriver.io reporting: T216818