[Bug] Scribunto_LuaSandboxTests repeatedly fails because it relies on code execution time
Closed, ResolvedPublic
Actions

Description

There is a flaky test in the Scribunto extension that started to repeatedly fail for the same reason, wrecking my nerves and doing nothing good but (literally) wasting time:

Just one of many recent examples: https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4322/console

16:09:00 1) LuaSandbox: Scribunto_LuaSandboxTests::testArgumentParsingTime
16:09:00 Unused arguments not counted in preprocess
16:09:00 Failed asserting that 0.55842105400000008 is less than 0.25.
16:09:00 
16:09:00 /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer/src/extensions/Scribunto/tests/engines/LuaSandbox/SandboxTest.php:75

The trivial reason why this fails is that this runs on a virtualized CI machine. There are other things running the same time, delaying execution. One simply can not rely on any minimum or maximum execution time of a specific snippet of code. This is guaranteed to fail.

Details

Subject	Repo	Branch	Lines +/-
Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust	mediawiki/extensions/Scribunto	master	+25 -7
Kill bogus "benchmarking" test	mediawiki/extensions/Scribunto	master	+0 -87
Replace time waster in SandboxTest with simple counter	mediawiki/extensions/Scribunto	master	+6 -15

Customize query in gerrit

Related Objects

Mentioned Here: rELUA12acfa95efd9: Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust
T47684: getExpandedArguments should not count toward the 10-second Lua execution limit

Event Timeline

thiemowmde created this task.Aug 19 2016, 7:56 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 19 2016, 7:56 AM

thiemowmde triaged this task as Unbreak Now! priority.Aug 19 2016, 7:56 AM

Restricted Application added subscribers: Luke081515, TerraCodes. · View Herald TranscriptAug 19 2016, 7:56 AM

thiemowmde added a project: Wikidata-Sprint-2016-08-16.Aug 19 2016, 7:56 AM

thiemowmde added a project: Patch-For-Review.

thiemowmde moved this task from incoming to in progress on the Wikidata board.

thiemowmde moved this task from Proposed to Review on the Wikidata-Sprint-2016-08-16 board.

Change 305499 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Replace time waster in SandboxTest with simple counter

https://gerrit.wikimedia.org/r/305499

Change 305620 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Kill bogus "benchmarking" test

https://gerrit.wikimedia.org/r/305620

Change 305649 had a related patch set uploaded (by Hoo man):
Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

I understand your frustration but two things:

This is not UBN!, it is High, as you/the scribunto maintainer have been dealing with this for a while now. This is just semantics.

Please be more diplomatic when communicating with your co-workers. Words like "bogus", "time waster", "ridiculous" in the commit messages do not add any real value and instead only incite the other party.

@Anomie: can you help me understand what this test is trying to test for and what it means that it is now failing (iow: a failing test should imply broken code: where's the broken code?).

In T143389#2567454, @gerritbot wrote:

Change 305649 had a related patch set uploaded (by Hoo man):
Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

It looks like this patch is moving forward nicely in a productive working relationship way. Thank you @hoo, for providing that.

In T143389#2568039, @greg wrote:

@Anomie: can you help me understand what this test is trying to test for and what it means that it is now failing (iow: a failing test should imply broken code: where's the broken code?).

Scribunto has a feature where it limits the total CPU time used to a configurable limit, to prevent a module from running indefinitely during the parse.

Some things that nominally occur "inside" Lua really should not be counted against this time limit. For example, the wikitext {{#invoke:MyModule|myFunc| {{some template}} }} calls the function myFunc in Module:MyModule, passing as a parameter the expanded wikitext of {{some template}}. For performance, however, MediaWiki does not actually expand the wikitext for {{some template}} until it's actually accessed inside MyModule. Per T47684: getExpandedArguments should not count toward the 10-second Lua execution limit, this argument expansion time should not be counted against the time limit. On the other hand, if something inside MyModule parses wikitext instead of having it passed in as an argument, the time inside that recursive parse should count against the time limit.

This test verifies that the time is accounted as described above, i.e. that Scribunto calls luasandbox's pauseUsageTimer() and unpauseUsageTimer() in the right places to get the proper outcome. It does that by constructing a simple version of {{some template}} that is intended to reliably use 0.500s of CPU time, then asks Scribunto how much CPU time was actually measured. In the cases where the argument expansion time should not be counted, Scribunto should have only accounted somewhere around 0.001s. The test itself checks whether accounted time is more or less than 0.250s.

This task is about the fact that something ~~in WMDE's CI infrastructure~~ is causing Scribunto to measure accounted time far in excess of 0.001s during this test. I can think of a few possibilities, but nothing that gives me something to do about it:

luasandbox appears to count both 'user' and 'system' time, while the test only checks 'user' time when implementing the delay. But that shouldn't cause the test to fail: it would make the delay longer than 0.5s of total CPU time, but that shouldn't matter when it's not supposed to be counted and the cases where it is counted aren't the problem here.
There might be a bug in luasandbox that makes it count time incorrectly. As far as I know, though, we've never run into this on WMF's CI infrastructure for Scribunto tests and I've never seen it in local testing, so I have no way to begin reproducing it to test this hypothesis.
WMDE's CI infrastructure might be using a version of luasandbox compiled without the presence of the CLOCK_THREAD_CPUTIME_ID constant for clock_gettime(), in which case it would measure wall time rather than CPU time. I have no way to check this.
WMDE's virtualized CI infrastructure might be emulating a system with a variable clock speed, so sometimes instead of a 2GHz processor it's running a 2KHz processor. That could cause even parts of the code that normally take only 0.001s to take radically more time than that due to the excessively slow processor. I have no way to test this.
WMDE's virtualized CI infrastructure might have buggy POSIX clock or timer behavior in the virtualized environment that somehow breaks luasandbox's accounting. For example, it might implement CLOCK_THREAD_CPUTIME_ID in terms of wall clock time or host CPU time instead of virtualized environment CPU time. I have no way to test this either.
WMDE's virtualized CI infrastructure might have bugs that cause it to randomly block for relatively long periods of time in system calls in a way that still counts as system time used. I have no way to test this either.
Something involved in one of their extensions being tested in conjunction with Scribunto might be causing extreme slowdowns in parts of the code outside the bit that's specifically not accounted. It seems unlikely that they wouldn't have noticed this themselves, and I wouldn't know where to begin to even start with testing that theory.

Edit: ~~Struck~~ and added some text to account for something I missed.

Note, afaict: this is running on *our* CI infra (https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4322/consoleFull ):

Building remotely on integration-slave-trusty-1017 (phpflavor-hhvm contintLabsSlave UbuntuTrusty phpflavor-php55) in workspace /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer

Yeah, it looks like you're right. Strike most of the ones about broken infrastructure. But I still haven't seen this on Scribunto extension tests ever.

In T143389#2568372, @Anomie wrote:

But I still haven't seen this on Scribunto extension tests ever.

Since this issue isn't surfacing for the Scribunto extension itself during it's runs, I'm removing the CI-Infra project as that tells me this is unlikely to be a CI-Infra issue. Please re-add if it is determined to be so in the future. (I'll still stay subscribed, of course.)

Affected job runs (might be inconclusive, I only scanned from https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/3800/console onwards):

All of them seem to be Wikibase jobs, but that could very well be because our commit activity is quite high.

Change 305499 abandoned by Thiemo Mättig (WMDE):
Replace time waster in SandboxTest with simple counter

Reason:
Please merge Ic40c8d7. Multiple people spend hours working on this issue.

https://gerrit.wikimedia.org/r/305499

Change 305620 abandoned by Thiemo Mättig (WMDE):
Kill bogus "benchmarking" test

Reason:
Please merge Ic40c8d7. Multiple people spend hours working on this issue.

https://gerrit.wikimedia.org/r/305620

Lydia_Pintscher moved this task from in progress to ready to go on the Wikidata board.Oct 17 2016, 2:37 PM

Rical subscribed.Sep 7 2017, 11:04 AM

Change 305649 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

ReleaseTaggerBot added a project: MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)).Oct 10 2017, 9:00 AM

Legoktm added a project: LuaSandbox.Feb 3 2018, 10:53 PM

Does rELUA12acfa95efd9: Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust seem to have fixed this for you, i.e. can we close this now?

Thanks for asking. Personally, I have never seen this again since then. For a reason unknown to me the issue even disappeared with the patch being unmerged for more than a year. Maybe because of a luasandbox update?

Rical unsubscribed.Mar 20 2018, 9:53 AM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:40 PM

[Bug] Scribunto_LuaSandboxTests repeatedly fails because it relies on code execution timeClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

[Bug] Scribunto_LuaSandboxTests repeatedly fails because it relies on code execution time
Closed, ResolvedPublic
Actions