Page MenuHomePhabricator

Flaky unit test AbuseFilterConsequencesTest
Closed, ResolvedPublic

Description

https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php70-docker/906/console
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/TimedMediaHandler/+/452974/

There was 1 failure:

1) AbuseFilterConsequencesTest::testFilterConsequences with data set #10 ('Test to check that degroup an...twice.', array(1, 3, 7, 12), array('edit', 'User:FilteredUser', '', 'A couple of lines about me...', 'My user page'), array(array(12), array(7, 12)))
The expected block option "edittalk" options does not match the actual one. Test description: Test to check that degroup and block are both executed and degroup warning is shown twice.

/workspace/src/extensions/AbuseFilter/tests/phpunit/AbuseFilterConsequencesTest.php:584
/workspace/src/tests/phpunit/MediaWikiTestCase.php:475
/workspace/src/maintenance/doMaintenance.php:94

Might be related:

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : masterRaise tolerance for time-related unit tests to 10 seconds
mediawiki/extensions/AbuseFilter : masterFix wrong error message for PHPUnit

Event Timeline

Krinkle created this task.Aug 16 2018, 4:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2018, 4:40 PM

@Daimona @Huji Could you re-review https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/444442/ to see if there's anything that stands out that could explain the test sometimes failing? If not, it is also possible that the failure is caused by a change in core or another extension.

@Krinkle Currently I'm not able to do any testing, however this particular test should be very similar to the other ones, and it doesn't seem to require more features than the others (i.e. I wouldn't expect it to break alone due to an external change). In fact, now it has just been executed correctly.
It probably deserves further investigation to determine why it failed, though. Could be due to internal AF logic, or due to some conflicts between tests, or some cache issue, but I don't really have more specific ideas.

Oh well, there's something we can do anyway. The error message

'The expected block option "edittalk" options does not match the actual one.'

is repeated twice through the code. This one is correct, while this one is not, since it refers to block duration. I'll try to see if I can submit a patch for it via the web gerrit editor.
Knowing this, I'm pretty sure this specific error comes from line 529, where we got some problem in comparing block durations. This is very likely to be caused by a tiny difference in testing timings, it's probably a matter of seconds. If the logic at this line is correct, then we should probably set a tolerance higher than 2, and do it in a couple of other similar tests as well.

Change 453163 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] We're currently emitting the same error twice, but in one of those cases it's completely wrong. Damned copy&pasting!

https://gerrit.wikimedia.org/r/453163

The patch above fixes the message problem and will display a more accurate error message. Now I'm wondering what tolerance would be acceptable. 5 seconds? Maybe 10? Also, do we have some standardised logic for this kind of tests to take into account execution time?

I am merging this patch (I'm to blame for not noticing the duplication).

As for execution time, I don't think we have standards in place. One idea commonly used is to save the timestamp, and instead of actually using the system time, use that saved time.

Huji assigned this task to Daimona.Aug 16 2018, 6:38 PM
Huji triaged this task as Medium priority.

Change 453163 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Fix wrong error message for PHPUnit

https://gerrit.wikimedia.org/r/453163

As for execution time, I don't think we have standards in place. One idea commonly used is to save the timestamp, and instead of actually using the system time, use that saved time.

Hmm, would this be feasible in this specific case? Anyway, I think setting a tolerance of 5-10 seconds should be fine, depending on the specific point of the code (for instance, in this point where we're checking block durations, 10 seconds shouldn't matter at all). It should help mitigate these failures, which I think are pretty rare anyway.

Change 454285 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Raise tolerance for time-related unit tests to 10 seconds

https://gerrit.wikimedia.org/r/454285

Indeed, done. Many thanks!

Change 454285 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Raise tolerance for time-related unit tests to 10 seconds

https://gerrit.wikimedia.org/r/454285

Daimona closed this task as Resolved.Aug 21 2018, 5:21 PM
Daimona removed a project: Patch-For-Review.

This should have fixed the problem. If it will appear again, we should think about a different solution.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM