Page MenuHomePhabricator

phpunit drops dead on some extension tests on hhvm
Closed, DuplicatePublic

Description

Sometimes, when running quibble/extension tests, phpunit just drops dead without any indication of what test failed and what happened.

Example in https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/37252/console:

22:00:40 ............................................................. 2806 / 4166 ( 67%)
22:00:43 .................................Traceback (most recent call last):
22:00:45   File "/usr/local/bin/quibble", line 11, in <module>
22:00:45     load_entry_point('quibble==0.0.0', 'console_scripts', 'quibble')()
22:00:45   File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 558, in main
22:00:45     cmd.execute()
22:00:45   File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 530, in execute
22:00:45     junit_file=junit_db_file)
22:00:45   File "/usr/local/lib/python3.5/dist-packages/quibble/test.py", line 196, in run_phpunit_database
22:00:45     run_phpunit(*args, **kwargs)
22:00:45   File "/usr/local/lib/python3.5/dist-packages/quibble/test.py", line 191, in run_phpunit
22:00:45     subprocess.check_call(cmd, cwd=mwdir, env=phpunit_env)
22:00:45   File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
22:00:45     raise CalledProcessError(retcode, cmd)
22:00:45 subprocess.CalledProcessError: Command '['php', 'tests/phpunit/phpunit.php', '--debug-tests', '--testsuite', 'extensions', '--group', 'Database', '--exclude-group', 'Broken,ParserFuzz,Stub', '--log-junit', '/workspace/log/junit-db.xml']' returned non-zero exit status -11

There's no artifacts in the build and as far as I can see, no indication of which test failed. -11 looks like segfault, but it's not clear which test caused it.
I think we need to create some way for knowing what happened with such tests.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2019, 10:07 PM

This time the log is produced and ends with:

PHPUnitCommand] Start test Scribunto_LuaTextLibraryTest::testLua with data set #70
Parser: using preprocessor: Preprocessor_Hash
[Scribunto] Scribunto_LuaStandaloneInterpreter::__construct: creating interpreter: 'exec' '/bin/sh' '/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/lua_ulimit.sh' '30' '31' '48828' ''\''/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/binaries/lua5_1_5_linux_64_generic/lua'\'' '\''/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/mw_main.lua'\'' '\''/workspace/src/extensions/Scribunto/includes'\'' '\''439'\'' '\''8'\'''

So it may be Scribunto test crashing?

Smalyshev triaged this task as Unbreak Now! priority.
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptMar 7 2019, 11:32 PM
greg added a subscriber: greg.

Adding Editing-team per Dev/Maintainers. Please take a look at this test failure.

Anomie added a subscriber: Anomie.Mar 8 2019, 3:06 PM

This time the log is produced and ends with:

PHPUnitCommand] Start test Scribunto_LuaTextLibraryTest::testLua with data set #70
Parser: using preprocessor: Preprocessor_Hash
[Scribunto] Scribunto_LuaStandaloneInterpreter::__construct: creating interpreter: 'exec' '/bin/sh' '/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/lua_ulimit.sh' '30' '31' '48828' ''\''/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/binaries/lua5_1_5_linux_64_generic/lua'\'' '\''/workspace/src/extensions/Scribunto/includes/engines/LuaStandalone/mw_main.lua'\'' '\''/workspace/src/extensions/Scribunto/includes'\'' '\''439'\'' '\''8'\'''

So it may be Scribunto test crashing?

Which log are you looking at to see that? When I look at https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/38428/artifact/log/mw-debug-cli.log I see it passes that test without issue. I also note that that seems to be number 7119 / 20210, while https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/38428/console shows the failure occurred at somewhere around 13369 / 20210 (13334 on the previous line plus 34 dots before the termination).

Counting the "Start test" lines in mw-debug-cli.log shows 13368 tests were started. The end of the log is

[PHPUnitCommand] Start test CirrusSearch\SearcherTest::testSearchText with data set "insource_001-default"
[Wikibase] Wikibase\SettingsArray::getSetting: setting conceptBaseUri was given as a closure, resolve it to 'http://ed7893fd734a/entity/'
[session] CentralAuthSessionProvider::provideSessionInfo: Not enabled, falling back to core sessions
[session] SessionBackend "6525agjnrv9goclt8u80b5ut5pmcv3h3" is unsaved, marking dirty in constructor
[session] SessionBackend "6525agjnrv9goclt8u80b5ut5pmcv3h3" save: dataDirty=1 metaDirty=1 forcePersist=0
IP: 127.0.0.1
[PHPUnitCommand] End test CirrusSearch\SearcherTest::testSearchText with data set "insource_001-default"

So apparently it finished that test and crashed before starting the next? Which I guess would have been CirrusSearch\SearcherTest::testSearchText with data set "insource_001-fullyfeatured-interwiki" based on the pattern.

Hmm I was looking at the same log but I got different data that time. Now I see what you were seeing. Maybe it was somehow partially copied that time? No idea.
Anyway, this suggests the crash happens between tests, which is even stranger. But I guess not Scribunto's fault then.

So here's another instance: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/38573/

Again, as I download mw-debug-cli.log right now, it ends with Scribunto_LuaSiteLibraryTest::testLua and is only 5M size (against 11M size of the log now in previous one) so I wonder if the log somehow is not completely copied yet?

That one broke at around test 6893 of 19954, 34.54% of the way through the run versus 66.15% for the other one. Why such a big difference?

The mw-debug-cli.log there is probably complete, I see 6893 instances of "Start test" which matches the progress getting to 6844 / 19954 plus 48 dots before the termination.

I'm doubting this has much to do with the specific tests. I see both of these say they ran on integration-slave-docker-1051, is it possible there's something wrong with that instance?

As you see from the task description, the point where it breaks seems to be random - it can even break in different phpunit run. This one: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/38637/console broke at 4425 / 19953 ( 22%). So it's probably not a specific test... Also the last one was on integration-slave-docker-1037 so not sure it's instance related... does docker even have "instances"? Not sure how it works now.

Smalyshev renamed this task from phpunit drops dead on some extension tests to phpunit drops dead on some extension tests on hhvm.Mar 13 2019, 12:25 AM
Smalyshev added a project: HHVM.
Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptMar 13 2019, 12:28 AM

For the record, that was due to an issue in libc6/libpthread which got fixed in a more recent Debian package than the one we had in the Docker container.

glibc (2.24-11+deb9u4) stretch; urgency=medium

  • Fix a use after free in pthread_create(). Closes: #916925.