Page MenuHomePhabricator

Flaky test: MediaWiki\CheckUser\Tests\Integration\Investigate\Services\CompareServiceTest::testGetTargetsOverLimit with data set "One target is over limit"
Closed, ResolvedPublic

Description

See https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php72-docker/110578/console

13:40:32 There was 1 failure:
13:40:32 
13:40:32 1) MediaWiki\CheckUser\Tests\Integration\Investigate\Services\CompareServiceTest::testGetTargetsOverLimit with data set "One target is over limit" (array(array('1.2.3.4', 'User1', '1.2.3.5'), array('1.2.3.5'), 4), array('1.2.3.4'))
13:40:32 Failed asserting that two arrays are equal.
13:40:32 --- Expected
13:40:32 +++ Actual
13:40:32 @@ @@
13:40:32  Array (
13:40:32 -    0 => '1.2.3.4'
13:40:32  )
13:40:32 
13:40:32 /workspace/src/extensions/CheckUser/tests/phpunit/integration/Investigate/Services/CompareServiceTest.php:285
13:40:32 /workspace/src/tests/phpunit/MediaWikiIntegrationTestCase.php:516
13:40:32 === Logs generated by test case
13:40:32 [objectcache] [debug] MainWANObjectCache using store {class} {"class":"EmptyBagOStuff"}
13:40:32 [localisation] [debug] LocalisationCache using store LCStoreNull []
13:40:32 [objectcache] [debug] MainWANObjectCache using store {class} {"class":"EmptyBagOStuff"}
13:40:32 ===
13:40:32 
13:40:32 FAILURES!
13:40:32 Tests: 4637, Assertions: 16201, Failures: 1, Skipped: 53.

Pretty sure I've seen this more than once but only find that one example at the moment, seen in gate-and-submit of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/815923

Event Timeline

Daimona triaged this task as Unbreak Now! priority.Sep 10 2022, 12:10 AM
Daimona subscribed.

Also seen for a core patch: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/831158. Even if not consistently broken, it seems sufficiently broken to warrant UBN.

I don't see anything wrong with the code of the test itself. This particular test class adds to the test table 5 rows with the IP of 1.2.3.4. It then checks to see if this IP has more than 4 entries in the DB. As it does it should always return as being over limit.

The tests run fine independently from other extensions and even after running them locally around 5 times I've not seen this error. It also does not seem to be failing consistently. Therefore, I think it has something to do with the order that the tests are run in. For both examples, the only 'F' (failure) is marked as being earlier in the running of the tests. While running all Database tests locally (core + installed extensions) the Investigate tests run much later in the cycle and do not fail locally. This could explain the difference and why it's not consistently failing, because I would presume the order the tests are run in is not kept the same run between run and so if the Investigate tests running early matters for some reason then this could explain the inconsistent failure.

Because the test is fails because 1.2.3.4 is not returned, this means that the method is not receiving all the rows it expects. This removes the explanation that another extension has forgot to properly clear up the DB test table as the extra rows would not affect the expected output. The reasons I can think of are:

  • The test cases in this test file have the addDBData method called before the DB is ready for the test case making some or all of the test data not be present on the actual DB (unlikely) or is called when the test is underway (should be impossible)
  • The test case is flaky somehow
  • The method being tested reads from a replica DB, so perhaps replication is not occuring before the tests run (seems unlikely)
  • The method being tested is flaky and sometimes returns incorrect values even though the DB state is correct

I can't seem to find the issue, but as I'm not even able to re-produce this error locally I suspect it would be hard to diagnose.

Further thoughts:

  • If the DB does not support limit and order with a select union query then the method being tested returns an empty array. However, the last data set for this method "Two targets are over limit" does not fail when "One target is over limit" fails. This means that this should not be returning here unless something is broken with how the tests run.
  • This is the only data set that has 'excludeTargets' have a non empty array.

Change 831194 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/CheckUser@master] tests: Use fake timer for CompareServiceTest

https://gerrit.wikimedia.org/r/831194

Thanks for working on a fix. This has been +2'd. Will leave to @Umherirrender to close in case there are other fixes that are needed for this.

Change 831194 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] tests: Use fake timer for CompareServiceTest

https://gerrit.wikimedia.org/r/831194

Change 831206 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/CheckUser@REL1_39] tests: Use fake timer for CompareServiceTest

https://gerrit.wikimedia.org/r/831206

Change 831206 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@REL1_39] tests: Use fake timer for CompareServiceTest

https://gerrit.wikimedia.org/r/831206

Thanks for working on a fix. This has been +2'd. Will leave to @Umherirrender to close in case there are other fixes that are needed for this.

No, it should be no longer flaky