2017-05-25 10:23:50 [WSawtgpAIDYAADAqCrEAAABK] mw1184 frwiki 1.30.0-wmf.1 exception ERROR: [WSawtgpAIDYAADAqCrEAAABK] /wiki/Cat%C3%A9gorie:D%C3%A9c%C3%A8s_en_novembre_1698 MWException from line 335 of /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php: Scribunto_LuaSandboxInterpreter::callFunction: LuaSandboxFunction::call returned false {"exception_id":"WSawtgpAIDYAADAqCrEAAABK","caught_by":"mwe_handler"} [Exception MWException] (/srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php:335) Scribunto_LuaSandboxInterpreter::callFunction: LuaSandboxFunction::call returned false #0 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaCommon/LuaCommon.php(154): Scribunto_LuaSandboxInterpreter->callFunction(LuaSandboxFunction, array) #1 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaCommon/SiteLibrary.php(91): Scribunto_LuaEngine->registerInterface(string, array, array) #2 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaCommon/LuaCommon.php(488): Scribunto_LuaSiteLibrary->register() #3 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaCommon/LuaCommon.php(125): Scribunto_LuaEngine->instantiatePHPLibrary(string, string, boolean) #4 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php(37): Scribunto_LuaEngine->load() #5 /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/common/Hooks.php(115): Scribunto_LuaSandboxEngine->getResourceUsage(integer) #6 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(3408): ScribuntoHooks::invokeHook(Parser, PPTemplateFrame_Hash, array) #7 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(3128): Parser->callParserFunction(PPTemplateFrame_Hash, string, array) #8 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Preprocessor_Hash.php(1039): Parser->braceSubstitution(array, PPTemplateFrame_Hash) #9 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(3282): PPFrame_Hash->expand(PPNode_Hash_Tree) #10 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Preprocessor_Hash.php(1039): Parser->braceSubstitution(array, PPTemplateFrame_Hash) #11 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Preprocessor_Hash.php(1472): PPFrame_Hash->expand(PPNode_Hash_Tree, integer) #12 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(3279): PPTemplateFrame_Hash->cachedExpand(string, PPNode_Hash_Tree) #13 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Preprocessor_Hash.php(1039): Parser->braceSubstitution(array, PPFrame_Hash) #14 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(2942): PPFrame_Hash->expand(PPNode_Hash_Tree, integer) #15 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(1297): Parser->replaceVariables(string) #16 /srv/mediawiki/php-1.30.0-wmf.1/includes/parser/Parser.php(451): Parser->internalParse(string) #17 /srv/mediawiki/php-1.30.0-wmf.1/includes/cache/MessageCache.php(1172): Parser->parse(string, Title, ParserOptions, boolean) #18 /srv/mediawiki/php-1.30.0-wmf.1/includes/Message.php(1221): MessageCache->parse(string, Title, boolean, boolean, Language) #19 /srv/mediawiki/php-1.30.0-wmf.1/includes/Message.php(872): Message->parseText(string) #20 /srv/mediawiki/php-1.30.0-wmf.1/includes/Message.php(958): Message->toString(string) #21 /srv/mediawiki/php-1.30.0-wmf.1/includes/CategoryViewer.php(751): Message->parseAsBlock() #22 /srv/mediawiki/php-1.30.0-wmf.1/includes/CategoryViewer.php(414): CategoryViewer->getCountMessage(integer, string, string) #23 /srv/mediawiki/php-1.30.0-wmf.1/includes/CategoryViewer.php(119): CategoryViewer->getSubcategorySection() #24 /srv/mediawiki/php-1.30.0-wmf.1/includes/page/CategoryPage.php(117): CategoryViewer->getHTML() #25 /srv/mediawiki/php-1.30.0-wmf.1/includes/page/CategoryPage.php(72): CategoryPage->closeShowCategory() #26 /srv/mediawiki/php-1.30.0-wmf.1/includes/actions/ViewAction.php(68): CategoryPage->view() #27 /srv/mediawiki/php-1.30.0-wmf.1/includes/MediaWiki.php(499): ViewAction->show() #28 /srv/mediawiki/php-1.30.0-wmf.1/includes/MediaWiki.php(293): MediaWiki->performAction(CategoryTreeCategoryPage, Title) #29 /srv/mediawiki/php-1.30.0-wmf.1/includes/MediaWiki.php(862): MediaWiki->performRequest() #30 /srv/mediawiki/php-1.30.0-wmf.1/includes/MediaWiki.php(523): MediaWiki->main() #31 /srv/mediawiki/php-1.30.0-wmf.1/index.php(43): MediaWiki->run() #32 /srv/mediawiki/w/index.php(3): include(string) #33 {main}
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | PRODUCTION ERROR | Anomie | T166348 Scribunto_LuaSandboxInterpreter::callFunction: LuaSandboxFunction::call returned false | ||
Resolved | MoritzMuehlenhoff | T171166 Build and push a new hhvm-luasandbox package |
Event Timeline
Unfortunately, this doesn't tell us much. Visiting https://fr.wikipedia.org/wiki/Cat%C3%A9gorie:D%C3%A9c%C3%A8s_en_novembre_1698 doesn't seem to re-trigger the error, nor does purging the page.
Do we log any of PHP's warnings that should correlate with these exceptions?
Yes, they're just in a different file (hhvm.log) so you have to manually line up request times/servers. At a very quick glance, I noticed that
LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329
is showing up in fatalmonitor, unsure if that's related or a separate issue.
Looks like the corresponding warnings for the exception in the topic are probably
May 25 10:23:50 mw1184: #012Warning: LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329 May 25 10:23:50 mw1184: [Thu May 25 10:23:50 2017] [hphp] [...] [] \nWarning: LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329 May 25 10:23:50 mw1184: #012Warning: LuaSandboxFunction::call(): unable to convert argument 1 to a lua value in /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329 May 25 10:23:50 mw1184: [Thu May 25 10:23:50 2017] [hphp] [...] [] \nWarning: LuaSandboxFunction::call(): unable to convert argument 1 to a lua value in /srv/mediawiki/php-1.30.0-wmf.1/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329
Given those warnings, what seems to be going on in this particular stack trace is that the $info being constructed in Scribunto_LuaSiteLibrary::register() is somehow getting an object or array that contains itself. Any similar stack trace that has "SiteLibrary.php(91)" at #1 is probably the same thing.
But nothing in that method jumps out at me as being at all likely to produce such an object/array.
Change 358988 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.5] Add some logging to debug T166348
Change 358988 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.5] Add some logging to debug T166348
Change 358993 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.5] Add some logging to debug T166348, take 2
Change 358993 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.5] Add some logging to debug T166348, take 2
Change 361508 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.6] Add some logging to debug T166348
Change 361508 merged by jenkins-bot:
[mediawiki/extensions/Scribunto@wmf/1.30.0-wmf.6] Add some logging to debug T166348
Looking at the logged data, it appears that there's some sort of a false positive in the luasandbox extension's recursion detection. I'm seeing it report the error when passing something as simple as [ "lang" => "ru" ].
I don't see any reason why it would false-positive though, unless it's possible for something to somehow throw an exception and exit from the middle of luasandbox_push_hashtable so it misses removing the entry from protectionSet, then later it happens to use the same pointer address for a different HashTable and hits the false-positive. Or unless IMPLEMENT_THREAD_LOCAL doesn't actually work.
It seems like a strange decision to block the train now when this has been going on since March 29 or earlier (exception.log-20170329.gz is the earliest available on mwlog1001).
Anomie: the logspam level has just been increasing and it's downing out other errors. It's gotten very frequent since wmf.6 so something has changed recently that is triggering this more often.
I would normally let @greg make the call but he's on vacation this week. @demon, @thcipriani, what do you think?
The top 3 errors on my fatalmonitor dashboard are
Warning: LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.6/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329 31 Warning: LuaSandboxFunction::call(): recursion detected in /srv/mediawiki/php-1.30.0-wmf.7/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 312 14 Warning: LuaSandboxFunction::call(): unable to convert argument 1 to a lua value in /srv/mediawiki/php-1.30.0-wmf.6/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 329 6
(those counts are for a 30 minute window)
I could just filter these but we've made a policy of zero tolerance for logspam. The decision was to block the train on any significant new Wikimedia-production-error events that are introduced with a deployment. I'm not sure how this one flew under the radar for so long but it looks significant to me as of wmf.6.
Except this wasn't introduced with a deployment, at least not any deployment in the past three months.
If my suspicions[1] are correct, no change deployed with the train caused these and no change deployed with the train is going fix these log messages. It'll need a new version of the hhvm-luasandbox package built and deployed to fix the problem.
[1]: After looking at it a bit more, I'm pretty sure the "unless it's possible for something to somehow throw an exception and exit from the middle of luasandbox_push_hashtable" I mentioned in T166348#3382705 is indeed possible.
My C++ is too terrible/non-existant to diagnose this; however, the error rate climb does seem to line-up with the rollout of HHVM 3.18, and it's entirely possible this error is coming from hhvm-luasandbox (https://github.com/wikimedia/mediawiki-php-luasandbox/blob/fe4addd002163a33101e19c9c40d260b0ae6ee52/data_conversion.c#L210). @Anomie your theory is that some interface in HHVM changed which means that protectionSet is not being managed properly (i.e., not thread-safe or luasandbox_push_hashtable is blowing up somewhere we can't see).
If this is the case, there's certainly nothing we can do about this error message from the perspective of the train. Maybe @MoritzMuehlenhoff and @tstarling can verify the problem/take a look at the extension's use of HHVM bits?
As far as I can tell it was happening even before HHVM 3.18. In exception.log-20170330.gz, I see the error logged by mw1189, mw1229, mw1235, mw1280, mw1281, and mw1282. According to https://wikitech.wikimedia.org/wiki/Server_Admin_Log, however, mw1189 got HHVM 3.18 on May 11, mw1229 and mw1235 got it on June 12, and the other three got it on June 13.
Change 362301 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/php/luasandbox@master] Redo PHP->Lua data structure recursion protection
Change 364268 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] scap: logstash_checker ignore for T166348
Change 364268 merged by Andrew Bogott:
[operations/puppet@production] scap: logstash_checker ignore for T166348
Change 362301 merged by jenkins-bot:
[mediawiki/php/luasandbox@master] Redo PHP->Lua data structure recursion protection
Next step is for someone to build an updated deb for luasandbox and push it out, then we can see whether the warnings go away (and stay gone, I suspect restarting HHVM on a server would also stop/reduce the warnings for a short time).
HHVM-Luasandbox has been upgraded to 2.0.13 in production, let's see whether this fixes the bug.
I see on fluorine mwlog1001 that the "LuaSandboxFunction::call(): recursion detected", "LuaSandboxFunction::call(): unable to convert argument 1 to a lua value", and "LuaSandboxFunction::call returned false" messages have not been logged since 09:56:57 UTC. That's a good sign.
No instances of the log messages since 2017-08-23 09:56:57 UTC. Let's call this fixed.