Page MenuHomePhabricator

[Bug] Scribunto_LuaSandboxTests repeatedly fails because it relies on code execution time
Closed, ResolvedPublic

Description

There is a flaky test in the Scribunto extension that started to repeatedly fail for the same reason, wrecking my nerves and doing nothing good but (literally) wasting time:

Just one of many recent examples: https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4322/console

16:09:00 1) LuaSandbox: Scribunto_LuaSandboxTests::testArgumentParsingTime
16:09:00 Unused arguments not counted in preprocess
16:09:00 Failed asserting that 0.55842105400000008 is less than 0.25.
16:09:00 
16:09:00 /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer/src/extensions/Scribunto/tests/engines/LuaSandbox/SandboxTest.php:75

The trivial reason why this fails is that this runs on a virtualized CI machine. There are other things running the same time, delaying execution. One simply can not rely on any minimum or maximum execution time of a specific snippet of code. This is guaranteed to fail.

Suggested solutions:

  1. https://gerrit.wikimedia.org/r/305499 removes any code that measures CPU cycles with trivial counts. In consequence the test does not test any more what it was intended to test. The counts the patch introduces are correct: The hook is expected to be called three times. But the first two times should not add on top of the execution timer and the result getCPUUsage returns (search for luasandbox_timer_pause and luasandbox_timer_unpause in the luasandbox extension). @Anomie decided to block this patch with no way forward.
  2. A proper solution would be to mock the code that collects the benchmark results this test is currently testing. But the code doing this is not even part of this extension! Search for getCPUUsage. This is in luasandbox, which is a PHP extension written in C. I believe it's not even possible to mock this.
  3. https://gerrit.wikimedia.org/r/305620 just kills the thing.
  4. Remove Scribunto from the Wikibase CI. Seriously. If the extension is broken and we can not rely on their tests any more, having them in our CI is of no value any more. The contrary: all it does at the moment is causing harm. We can re-add it later when the issue is solved.

Event Timeline

thiemowmde triaged this task as Unbreak Now! priority.Aug 19 2016, 7:56 AM

Change 305499 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Replace time waster in SandboxTest with simple counter

https://gerrit.wikimedia.org/r/305499

Change 305620 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Kill bogus "benchmarking" test

https://gerrit.wikimedia.org/r/305620

Change 305649 had a related patch set uploaded (by Hoo man):
Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

greg lowered the priority of this task from Unbreak Now! to High.Aug 19 2016, 4:57 PM
greg subscribed.

I understand your frustration but two things:

  1. This is not UBN!, it is High, as you/the scribunto maintainer have been dealing with this for a while now. This is just semantics.
  1. Please be more diplomatic when communicating with your co-workers. Words like "bogus", "time waster", "ridiculous" in the commit messages do not add any real value and instead only incite the other party.

@Anomie: can you help me understand what this test is trying to test for and what it means that it is now failing (iow: a failing test should imply broken code: where's the broken code?).

Change 305649 had a related patch set uploaded (by Hoo man):
Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

It looks like this patch is moving forward nicely in a productive working relationship way. Thank you @hoo, for providing that.

@Anomie: can you help me understand what this test is trying to test for and what it means that it is now failing (iow: a failing test should imply broken code: where's the broken code?).

Scribunto has a feature where it limits the total CPU time used to a configurable limit, to prevent a module from running indefinitely during the parse.

Some things that nominally occur "inside" Lua really should not be counted against this time limit. For example, the wikitext {{#invoke:MyModule|myFunc| {{some template}} }} calls the function myFunc in Module:MyModule, passing as a parameter the expanded wikitext of {{some template}}. For performance, however, MediaWiki does not actually expand the wikitext for {{some template}} until it's actually accessed inside MyModule. Per T47684: getExpandedArguments should not count toward the 10-second Lua execution limit, this argument expansion time should not be counted against the time limit. On the other hand, if something inside MyModule parses wikitext instead of having it passed in as an argument, the time inside that recursive parse should count against the time limit.

This test verifies that the time is accounted as described above, i.e. that Scribunto calls luasandbox's pauseUsageTimer() and unpauseUsageTimer() in the right places to get the proper outcome. It does that by constructing a simple version of {{some template}} that is intended to reliably use 0.500s of CPU time, then asks Scribunto how much CPU time was actually measured. In the cases where the argument expansion time should not be counted, Scribunto should have only accounted somewhere around 0.001s. The test itself checks whether accounted time is more or less than 0.250s.

This task is about the fact that something in WMDE's CI infrastructure is causing Scribunto to measure accounted time far in excess of 0.001s during this test. I can think of a few possibilities, but nothing that gives me something to do about it:

  • luasandbox appears to count both 'user' and 'system' time, while the test only checks 'user' time when implementing the delay. But that shouldn't cause the test to fail: it would make the delay longer than 0.5s of total CPU time, but that shouldn't matter when it's not supposed to be counted and the cases where it is counted aren't the problem here.
  • There might be a bug in luasandbox that makes it count time incorrectly. As far as I know, though, we've never run into this on WMF's CI infrastructure for Scribunto tests and I've never seen it in local testing, so I have no way to begin reproducing it to test this hypothesis.
  • WMDE's CI infrastructure might be using a version of luasandbox compiled without the presence of the CLOCK_THREAD_CPUTIME_ID constant for clock_gettime(), in which case it would measure wall time rather than CPU time. I have no way to check this.
  • WMDE's virtualized CI infrastructure might be emulating a system with a variable clock speed, so sometimes instead of a 2GHz processor it's running a 2KHz processor. That could cause even parts of the code that normally take only 0.001s to take radically more time than that due to the excessively slow processor. I have no way to test this.
  • WMDE's virtualized CI infrastructure might have buggy POSIX clock or timer behavior in the virtualized environment that somehow breaks luasandbox's accounting. For example, it might implement CLOCK_THREAD_CPUTIME_ID in terms of wall clock time or host CPU time instead of virtualized environment CPU time. I have no way to test this either.
  • WMDE's virtualized CI infrastructure might have bugs that cause it to randomly block for relatively long periods of time in system calls in a way that still counts as system time used. I have no way to test this either.
  • Something involved in one of their extensions being tested in conjunction with Scribunto might be causing extreme slowdowns in parts of the code outside the bit that's specifically not accounted. It seems unlikely that they wouldn't have noticed this themselves, and I wouldn't know where to begin to even start with testing that theory.

Edit: Struck and added some text to account for something I missed.

Note, afaict: this is running on *our* CI infra (https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4322/consoleFull ):

Building remotely on integration-slave-trusty-1017 (phpflavor-hhvm contintLabsSlave UbuntuTrusty phpflavor-php55) in workspace /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer

Yeah, it looks like you're right. Strike most of the ones about broken infrastructure. But I still haven't seen this on Scribunto extension tests ever.

But I still haven't seen this on Scribunto extension tests ever.

Since this issue isn't surfacing for the Scribunto extension itself during it's runs, I'm removing the CI-Infra project as that tells me this is unlikely to be a CI-Infra issue. Please re-add if it is determined to be so in the future. (I'll still stay subscribed, of course.)

Affected job runs (might be inconclusive, I only scanned from https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/3800/console onwards):

https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4099/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4105/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4113/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4123/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4124/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4132/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4171/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4244/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4255/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4256/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4262/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4299/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4305/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4322/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4337/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4358/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4360/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4366/console
https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer/4383/console

All of them seem to be Wikibase jobs, but that could very well be because our commit activity is quite high.

Change 305499 abandoned by Thiemo Mättig (WMDE):
Replace time waster in SandboxTest with simple counter

Reason:
Please merge Ic40c8d7. Multiple people spend hours working on this issue.

https://gerrit.wikimedia.org/r/305499

Change 305620 abandoned by Thiemo Mättig (WMDE):
Kill bogus "benchmarking" test

Reason:
Please merge Ic40c8d7. Multiple people spend hours working on this issue.

https://gerrit.wikimedia.org/r/305620

Change 305649 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@master] Make Scribunto_LuaSandboxTests::testArgumentParsingTime more robust

https://gerrit.wikimedia.org/r/305649

Thanks for asking. Personally, I have never seen this again since then. For a reason unknown to me the issue even disappeared with the patch being unmerged for more than a year. Maybe because of a luasandbox update?