Page MenuHomePhabricator

Fix Flow random test failures
Open, MediumPublic

Description

Parent task to group together cases of random test failures solved by "recheck"

Related Objects

Event Timeline

As a first step, let's mark all of these as "Broken" then figure out how we can fix them.

Change 472263 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/Flow@master] Temporarily mark flaky tests as broken

https://gerrit.wikimedia.org/r/472263

Could someone in Release-Engineering-Team please help me better understand the environment in which the tests are running in the wmf-quibble-vendor-mysql-{php}-docker job? Some questions:

  1. Is there a master/replica setup in this environment?
  2. Can I run the Docker container(s) locally to try to reproduce these failures?
  3. Where can I view the contents of the DebianJessieDocker Dockerfile?

Thanks!

Change 472263 merged by jenkins-bot:
[mediawiki/extensions/Flow@master] Temporarily mark flaky tests as broken

https://gerrit.wikimedia.org/r/472263

Change 472456 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/Flow@REL1_32] Temporarily mark flaky tests as broken

https://gerrit.wikimedia.org/r/472456

Unassigning myself for now. When coming back to this I'd like to check:

  • if MySQL replication is set up in quibble
  • Try to reproduce locally by running quibble tests with the @Broken tests removed from the broken group
  • See if we can dump MySQL error log from quibble if/when tests fail. I suspect MySQL is not able to keep up with the write queries

@hashar docker run -it --rm --env-file ./.env -v "$(pwd)"/cache:/cache -v "$(pwd)"/log:/log -v "$(pwd)"/ref:/srv/git:ro -v "$(pwd)"/src:/workspace/src docker-registry.wikimedia.org/releng/quibble-jessie-hhvm took 1 hour and 45 minutes to build, and failed here:

There were 2 failures:

1) Warning
The data provider specified for ResourcesTest::testFileExistence is invalid.
ResourceLoaderFileModule::readStyleFile: style file not found: "/workspace/src/extensions/VisualEditor/lib/ve/lib/color-picker/color-picker.css"

/workspace/src/maintenance/doMaintenance.php:94

2) ResourcesTest::testMissingMessages
Message 'visualeditor-diff-no-changes' required by 'ext.visualEditor.mwsave' must exist
Failed asserting that false is true.

/workspace/src/tests/phpunit/structure/ResourcesTest.php:110
/workspace/src/tests/phpunit/MediaWikiTestCase.php:424
/workspace/src/maintenance/doMaintenance.php:94

FAILURES!
Tests: 10936, Assertions: 137753, Failures: 2, Skipped: 160.
Traceback (most recent call last):
  File "/usr/local/bin/quibble", line 9, in <module>
    load_entry_point('quibble==0.0.0', 'console_scripts', 'quibble')()
  File "/usr/local/lib/python3.4/dist-packages/quibble/cmd.py", line 558, in main
    cmd.execute()
  File "/usr/local/lib/python3.4/dist-packages/quibble/cmd.py", line 486, in execute
    junit_file=junit_dbless_file)
  File "/usr/local/lib/python3.4/dist-packages/quibble/test.py", line 201, in run_phpunit_databaseless
    run_phpunit(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/quibble/test.py", line 191, in run_phpunit
    subprocess.check_call(cmd, cwd=mwdir, env=phpunit_env)
  File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['php', 'tests/phpunit/phpunit.php', '--debug-tests', '--testsuite', 'extensions', '--exclude-group', 'Broken,ParserFuzz,Stub,Database', '--log-junit', '/workspace/log/junit-dbless.xml']' returned non-zero exit status 1

The test I wanted to reproduce failure for was in the Database group, and that never ran as part of the full suite since we errored out at the databaseless test phase. While the container was running, I did try executing one of the integration tests I'm interested with php tests/phpunit/phpunit.php extensions/Flow/tests/phpunit --filter testGetLastRevision and it passed. I'm not sure I'll be able to reproduce the random failure part locally, I might need to try to provide a fix and then wait for someone to flag it.

One thing I was curious about -- could we dump the MySQL / PHP / Apache error logs somewhere (stdout would do if nowhere else) when a build fails, in case that surfaces useful debugging information for failed CI builds?

ResourceLoaderFileModule::readStyleFile: style file not found: "/workspace/src/extensions/VisualEditor/lib/ve/lib/color-picker/color-picker.css"

mediawiki/extensions/VisualEditor has a submodule for VisualEditor/VisualEditor. Quibble should have initialized the submodule when cloning the repository. You can do it locally via:

git -C src/extensions/VisualEditor submodule update --init

(take care of files ownership, the Docker container runs as nobody:nogroup so you might need to chown -R nobody:nogroup src/extensions/VisualEditor)

Message 'visualeditor-diff-no-changes' required by 'ext.visualEditor.mwsave' must exist

ditto, the message is defined in the submodule (in lib/ve/i18n/qqq.json and other such files).

Change 472456 abandoned by Kosta Harlan:
Temporarily mark flaky tests as broken

Reason:
No comments/activity on this so I assume it's not needed anymore.

https://gerrit.wikimedia.org/r/472456

Same issue in https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/10779/console (for 497357):

16:24:01 ResourceLoaderFileModule::readStyleFile: style file not found: "/workspace/src/extensions/VisualEditor/lib/ve/lib/color-picker/color-picker.css"
...
16:24:01 Message 'visualeditor-diff-no-changes' required by 'ext.visualEditor.mwsave' must exist

Seems like either a bug in quibble's submodule initialization logic, or git clone flakiness.

Indeed, it did try to process the submodules but since Gerrit was not responsive/down the clones failed:

INFO:quibble.cmd:Updating git submodules of extensions and skins
extensions/VisualEditor/.gitmodules
+ git submodule foreach git clean -xdff -q
+ git submodule update --init --recursive
Submodule 'lib/ve' (https://gerrit.wikimedia.org/r/p/VisualEditor/VisualEditor.git) registered for path 'lib/ve'
Cloning into '/workspace/src/extensions/VisualEditor/lib/ve'...
fatal: could not read Username for 'https://gerrit.wikimedia.org': No such device or address
fatal: clone of 'https://gerrit.wikimedia.org/r/p/VisualEditor/VisualEditor.git' into submodule path '/workspace/src/extensions/VisualEditor/lib/ve' failed
Failed to clone 'lib/ve'. Retry scheduled
Cloning into '/workspace/src/extensions/VisualEditor/lib/ve'...
fatal: could not read Username for 'https://gerrit.wikimedia.org': No such device or address
fatal: clone of 'https://gerrit.wikimedia.org/r/p/VisualEditor/VisualEditor.git' into submodule path '/workspace/src/extensions/VisualEditor/lib/ve' failed
Failed to clone 'lib/ve' a second time, aborting
...
INFO:backend.MySQL:Initializing MySQL data directory

But since git submodule update does not exit non zero on failure, Quibble keeps continuing. The bug is T198980

Same issue in https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/10779/console (for 497357):

16:24:01 ResourceLoaderFileModule::readStyleFile: style file not found: "/workspace/src/extensions/VisualEditor/lib/ve/lib/color-picker/color-picker.css"
...
16:24:01 Message 'visualeditor-diff-no-changes' required by 'ext.visualEditor.mwsave' must exist

Seems like either a bug in quibble's submodule initialization logic, or git clone flakiness.

Per T198980, since Quibble 0.0.30, failing to update the submodules is now a failure as one would have expected (previously the result was always ignored).

Change 514810 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/Flow@master] Disable two flaky tests

https://gerrit.wikimedia.org/r/514810

I just came across T150430: Simultaneous edits to posts can cause an exception for the losing party, and am wondering if that is what is happening in some of these cases.

Change 514810 merged by jenkins-bot:
[mediawiki/extensions/Flow@master] Disable two flaky tests

https://gerrit.wikimedia.org/r/514810