Page MenuHomePhabricator

mediawiki/core CI failure with test WANObjectCacheTest::testGetWithSetCallback
Closed, ResolvedPublic

Description

Hello,

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/550594 has a test failure with:

09:50:22 There was 1 failure:
09:50:22 
09:50:22 1) WANObjectCacheTest::testGetWithSetCallback with data set #0 (array(), false)
09:50:22 Value still returned after expired (in grace)
09:50:22 Failed asserting that two strings are identical.
09:50:22 --- Expected
09:50:22 +++ Actual
09:50:22 @@ @@
09:50:22 -'xxx1'
09:50:22 +'xxx2'
09:50:22 
09:50:22 /workspace/src/tests/phpunit/includes/libs/objectcache/WANObjectCacheTest.php:466
09:50:22 /workspace/src/maintenance/doMaintenance.php:99
09:50:22 
09:50:22 FAILURES!

This looks like unrelated failure. To be sure, I've submitted a recheck at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/550638, which failed as well.

Could this be fixed, please?

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 13 2019, 10:47 AM
hashar added subscribers: aaron, Krinkle, hashar.

It might be that this test is flapping / unreliable or has some kind of time based race condition?

+ @aaron and @Krinkle who are familiar with the WANObjetCache thing (note that Krinkle is attending a conference this week)

At least for the CI containers, no change occurred recently, so the php packages are exactly the same.

Yep, encountered this multiple times now as well, thought it was related to some changes in master, but it's flaky :(

Maybe we should firstly mark test as broken/skipable?

I took a look into the test, however, this one is huuuugggeee, complicated and I can not really understand the intent of it, and as it is working pretty fine locally all the time, I don't feel very comfortable to work on this :( Maybe the person who can fix this, could also take a look into the test and try to make it easier to read and understand? :)

Change 551360 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Add more logging to getWithSetCallback()

https://gerrit.wikimedia.org/r/551360

PS1 of my logging change showed it failing due to worthRefreshPopular() returning true. Since the times are mocked, the chance of this happening should be a constant 1/825. To me, it seems like it's failing more often than that, like almost always. SamplingStatsdClientTest calls mt_srand(0), so maybe the exact right number of mt_rand() calls are done between SamplingStatsdClientTest and WANObjectCacheTest.

Problematic call counts would be:

php > mt_srand(0); for ( $i = 0; $i < 10000; $i++) { if ( mt_rand( 1, 1e9 ) <= 1e9 /825 ) print "$i\n"; }
408
702
1368
1879
2207
3353
3792
3950
5497
7118
8284
9272

Change 551364 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Fix random WANObjectCacheTest failures

https://gerrit.wikimedia.org/r/551364

Change 551364 merged by jenkins-bot:
[mediawiki/core@master] Fix random WANObjectCacheTest failures

https://gerrit.wikimedia.org/r/551364

Florian closed this task as Resolved.Nov 21 2019, 9:14 AM
Florian claimed this task.

Should be fixed, if not, we can still reopen this task :) Thanks @tstarling!

Change 551360 merged by jenkins-bot:
[mediawiki/core@master] Add more logging to getWithSetCallback()

https://gerrit.wikimedia.org/r/551360

hashar renamed this task from Jenkins fails for mediawiki/core to mediawiki/core CI failure with test WANObjectCacheTest::testGetWithSetCallback.Dec 20 2019, 8:46 AM