Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 due to OOM in Lua→PHP→Lua calls
Closed, ResolvedPublic

Description

No page access since 10 days to: https://commons.wikimedia.org/wiki/Category:SVG_logos_of_the_United_Kingdom !?
But only if logged in (probably due the default English language, see below), testet with Firefox (give 503) and Chrome (give 500) on two different machines (Win7).

Request from xx.xx.xxx.xx via cp3042 cp3042, Varnish XID 867253101
Error: 503, Service Unavailable at Sat, 22 Jul 2017 21:38:12 GMT

So this is probably a gadget error or similar user setting.
Edit: Some people reported a relation with the language setting: COM:HD
Edit: As cause Template:Countries of Europe now emerges (117893 transclusions)

So this server error only appears on non-English language setting.
Workaround

  • switch to EN (clear cache and switch back to the user language).
  • append ?uselang=en to the URL
There are a very large number of changes, so older changes are hidden. Show Older Changes

I've made an ugly workaround in https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/5?uselang=de, which AFAICT doesn't crash. But it does not help the issue that 1. LangSwitch do more expanding than necessary, and 2. the inner wikitext OOMs (apparently, someone with access can check)

zhuyifei1999 renamed this task from Category page HTTP ERROR 500/503: (Commons, probably language setting) to Commons pages transcluding Template:Countries_of_Europe with prefix= set HTTP 500/503 when accessed from non-English languages specified in the template.Jul 23 2017, 10:28 AM
Vachovec1 added a comment.EditedJul 23 2017, 10:35 AM

@zhuyifei1999: I removed the {{Countries of Europe}} template from both above mentioned affected pages and they are accessible again. The {{Countries of Europe}} template definitely needs fixing/reworking, though.

@Vachovec1 it's transcluded on 117893 pages, many have the potential to break (although the exact conditions are still hard to tell). If someone really needs to access the page, they are free to ?uselang=en. I don't see much benefit in removing the templates from the affected pages.

zhuyifei1999 renamed this task from Commons pages transcluding Template:Countries_of_Europe with prefix= set HTTP 500/503 when accessed from non-English languages specified in the template to Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template.Jul 23 2017, 11:01 AM

10 days => July 13 => Possibly related to MediaWiki train 1.30.0-wmf.7->1.30.0-wmf.9?

On July 12, Kürschner stated at Commons Help Desk that he had experienced the behaviour for "at least two days".

@Vachovec1 it's transcluded on 117893 pages, many have the potential to break (although the exact conditions are still hard to tell). If someone really needs to access the page, they are free to ?uselang=en. I don't see much benefit in removing the templates from the affected pages.

I see. But I would say that fully functional page without "language switch" template is definitely better than (mostly) unaccessible page.

Btw. I am not sure about your "langswitch" theory. The page https://commons.wikimedia.org/wiki/Category:Kiosks_in_Spain is unaccessible (blank page in Firefox) for me even without langswitch sufix. However, with ?uselang=en it works. Maybe it is due to my language setting (non-english)?

even without langswitch sufix

What do you mean by "langswitch sufix"?

Maybe it is due to my language setting (non-english)?

Let me clarify my guess:

Assume a template A with complex logic, and a template B with:

{{LangSwitch
|default={{A|some parameters}}
|aa={{A|some parameters}}
}}

and a page C transcluding template B with {{B}}.

When page C is accessed with the interface language as en, only default= of the Langswitch will be expanded, leaving aa= unexpanded, therefore template A will be effectively called only once.
When page C is accessed with the interface language as aa, default= will be unnecessarily expanded for LangSwitch Error: no default check, and aa= will be expanded as usual, and template A will be effectively be called twice. Due to either out-of-memory or timeout or a combination of both, the rendering crashes.

If this theory is true the question would be a. how to prevent calling template A twice, and b. how to optimize A or the underlying logic so that it does not out-of-memory or timeout.

even without langswitch sufix

What do you mean by "langswitch sufix"?

For example ?uselang=de sufix in the URL.

hoo added a subscriber: hoo.

With https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/3?uselang=de (logged out) I get:

Jul 23 11:33:17 mw1184:  #012Fatal error: unknown exception
Jul 23 11:33:17 mw1184:  [Sun Jul 23 11:33:17 2017] [hphp] [14687:7f5656fff700:23857:000001] [] \nFatal error: unknown exception

This is happening quite a lot :/

Just as a note: I didn't manage to get any further error logs, just what I pasted above :/

hoo added a subscriber: greg.Jul 23 2017, 12:13 PM

For example ?uselang=de sufix in the URL.

By "LangSwitch" I mean the template LangSwitch.

?uselang= is just one method to change the interface language (which is what matters in this bug); another method is setting it in the preferences. I have my interface English as I'm bad at almost every single language except en and zh.

Perhelion updated the task description. (Show Details)Jul 23 2017, 1:05 PM
Anomie added a subscriber: Anomie.Jul 24 2017, 4:48 PM

This reminds me of T161361: Unknown exception from Thomas Mann page on rowiki, which is currently stalled on being unable to reproduce it. I'll close that as a duplicate of this one, since this one can be reproduced.

The reproduction case in mwrepl:

$title = Title::newFromText( 'User:Zhuyifei1999/sandbox/3' );
$ctx = new RequestContext();
$ctx->setTitle( $title );
$ctx->setRequest( new FauxRequest( [ 'title' => $title->getPrefixedText(),'uselang'=>'de' ] ) );
$article = new Article( $title );
$article->setContext( $ctx );
Action::factory( 'view', $article, $ctx )->show();

It prints basically what hoo mentioned in T171392#3463694, which itself isn't very helpful.

It turns out that there are a ton of caught exceptions, so I wound up using the following in gdb (just before inputting the last line of the reproduction) to print every exception as it's thrown without requiring manual intervention to continue.

set pagination off
catch throw
commands
print $_exception
backtrace
continue
end

(although occasionally the print $_exception errors out and needs a manual continue)

Then I also set a break unwind-inl.h:136 to actually stop it when it creates the "unknown exception" exception. The trace just before that point is

Thread 1 "hhvm" hit Catchpoint 1 (exception thrown), 0x00007ffff02b6dbd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
$324 = (lua_longjmp *) 0x7fffffffb800
#0  0x00007ffff02b6dbd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffe65b2e7f in luaD_throw (L=L@entry=0x7fffd300d690, errcode=errcode@entry=4) at ldo.c:97
#2  0x00007fffe65b6ffa in luaM_realloc_ (L=L@entry=0x7fffd300d690, block=block@entry=0x0, osize=osize@entry=0, nsize=nsize@entry=40) at lmem.c:81
#3  0x00007fffe65b4326 in luaF_newCclosure (L=L@entry=0x7fffd300d690, nelems=nelems@entry=0, e=0x7fffd300ab70) at lfunc.c:24
#4  0x00007fffe65af061 in lua_pushcclosure (L=0x7fffd300d690, fn=0x7fffe67e1d50 <luasandbox_attach_trace(lua_State*)>, n=0) at lapi.c:491
#5  0x00007fffe67e7ce1 in luasandbox_call_helper (L=0x7fffd300d690, sandbox_zval=0x7fffd412b860, sandbox=0x7fffd300e410, args=0x7fffc94ed270, numArgs=2, return_value=return_value@entry=0x7fffc958f9c0, tsrm_ls=0x7fffe1527100) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1330
#6  0x00007fffe67e8245 in zim_LuaSandboxFunction_call (ht=2, return_value=0x7fffc958f9c0, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>, tsrm_ls=0x7fffe1527100) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1230
#7  0x0000000001d31620 in HPHP::zend_wrap_func (ar=0x7fffe137b020) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/ext_zend_compat/hhvm/zend-wrap-func.cpp:99
#8  0x00000000021c5894 in HPHP::iopNativeImpl (pc=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:5425
#9  HPHP::iopWrapper (pc=<optimized out>, fn=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:6232
#10 HPHP::dispatchImpl<false> () at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:6905
#11 0x0000000000f4484f in HPHP::exception_handler<void (*)()> (action=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/unwind-inl.h:30
#12 0x0000000000f44cfa in HPHP::enterVMCustomHandler<HPHP::enterVM(HPHP::ActRec*, Action) [with Action = HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)>::<lambda()>]::<lambda()> > (action=..., ar=0x7fffe137eaa0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1574
#13 HPHP::enterVM<HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)>::<lambda()> > (action=..., ar=0x7fffe137eaa0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1580
#14 HPHP::ExecutionContext::<lambda(HPHP::ActRec*)>::operator() (ar=0x7fffe137eaa0, __closure=<synthetic pointer>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1651
#15 HPHP::ExecutionContext::invokeFuncImpl<HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::TypedValue&)>, HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*, HPHP::TypedValue&)>, HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)> > (doEnterVM=..., doInitArgs=..., doStackCheck=..., useWeakTypes=false, invName=0x7fffd3a1c580, argc=2, cls=0x7fffd39d7ad0, thiz=0x7fffd412b820, f=0x7fffd3a17600, this=0x7fffe853f010) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1542
#16 HPHP::ExecutionContext::invokeFunc (this=0x7fffe853f010, f=f@entry=0x7fffd3a17600, args_=..., thiz=0x7fffd412b820, cls=0x7fffd39d7ad0, varEnv=varEnv@entry=0x0, invName=0x7fffd3a1c580, flags=HPHP::ExecutionContext::InvokeCuf, useWeakTypes=false) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1655
#17 0x0000000001d1dcfe in zend_call_function (fci=0x7fffffffb550, fci_cache=<optimized out>, tsrm_ls=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/ext_zend_compat/php-src/Zend/zend_execute_API.cpp:222
#18 0x00007fffe67e8685 in luasandbox_call_php (L=0x7fffd300d690) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1616
#19 0x00007fffe65b3570 in luaD_precall (L=L@entry=0x7fffd300d690, func=func@entry=0x7fffe872e160, nresults=nresults@entry=1) at ldo.c:320
#20 0x00007fffe65be101 in luaV_execute (L=L@entry=0x7fffd300d690, nexeccalls=nexeccalls@entry=1) at lvm.c:591
#21 0x00007fffe65b39cd in luaD_call (L=L@entry=0x7fffd300d690, func=0x7fffe872e130, nResults=nResults@entry=1) at ldo.c:378
#22 0x00007fffe65bc7f6 in callTMres (L=L@entry=0x7fffd300d690, res=res@entry=0x7fffe872e120, p1=<optimized out>, p2=<optimized out>, f=<optimized out>, f=<optimized out>) at lvm.c:88
#23 0x00007fffe65bcdf5 in luaV_gettable (L=L@entry=0x7fffd300d690, t=<optimized out>, key=<optimized out>, val=val@entry=0x7fffe872e120) at lvm.c:125
#24 0x00007fffe65bd9c4 in luaV_execute (L=L@entry=0x7fffd300d690, nexeccalls=2, nexeccalls@entry=1) at lvm.c:471
#25 0x00007fffe65b39cd in luaD_call (L=0x7fffd300d690, func=0x7fffd307ca40, nResults=<optimized out>) at ldo.c:378
#26 0x00007fffe65b2c6a in luaD_rawrunprotected (L=L@entry=0x7fffd300d690, f=f@entry=0x7fffe65ae3c0 <f_call(lua_State*, void*)>, ud=ud@entry=0x7fffffffb880) at ldo.c:116
#27 0x00007fffe65b3b3b in luaD_pcall (L=L@entry=0x7fffd300d690, func=func@entry=0x7fffe65ae3c0 <f_call(lua_State*, void*)>, u=u@entry=0x7fffffffb880, old_top=48, ef=<optimized out>) at ldo.c:464
#28 0x00007fffe65af7ac in lua_pcall (L=0x7fffd300d690, nargs=<optimized out>, nresults=-1, errfunc=<optimized out>) at lapi.c:821
#29 0x00007fffe67e7b51 in luasandbox_call_lua (sandbox=sandbox@entry=0x7fffd300e410, sandbox_zval=sandbox_zval@entry=0x7fffd412b860, nargs=nargs@entry=1, nresults=nresults@entry=-1, errfunc=errfunc@entry=2, tsrm_ls=tsrm_ls@entry=0x7fffe1527100) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1282
#30 0x00007fffe67e7d94 in luasandbox_call_helper (L=0x7fffd300d690, sandbox_zval=0x7fffd412b860, sandbox=0x7fffd300e410, args=<optimized out>, numArgs=1, return_value=return_value@entry=0x7fffd419cfa0, tsrm_ls=0x7fffe1527100) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1352
#31 0x00007fffe67e8245 in zim_LuaSandboxFunction_call (ht=1, return_value=0x7fffd419cfa0, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>, tsrm_ls=0x7fffe1527100) at /home/jmm/hhvm/dbg/php-luasandbox-2.0.12~jessie3/debian/build-hhvm/luasandbox.c:1230
#32 0x0000000001d31620 in HPHP::zend_wrap_func (ar=0x7fffe137ead0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/ext_zend_compat/hhvm/zend-wrap-func.cpp:99
#33 0x00000000021c5894 in HPHP::iopNativeImpl (pc=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:5425
#34 HPHP::iopWrapper (pc=<optimized out>, fn=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:6232
#35 HPHP::dispatchImpl<false> () at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:6905
#36 0x00000000012b49b3 in HPHP::dispatch () at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:6922
#37 HPHP::enterVMAtFunc (enterFnAr=enterFnAr@entry=0x7fffe137ff90, stk=<optimized out>, varEnv=varEnv@entry=0x7fffe85448d0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/bytecode.cpp:1512
#38 0x0000000000f44cd9 in HPHP::ExecutionContext::<lambda(HPHP::ActRec*)>::<lambda()>::operator() (__closure=<synthetic pointer>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1650
#39 HPHP::exception_handler<HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)>::<lambda()> > (action=...) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/vm/unwind-inl.h:30
#40 HPHP::<lambda()>::operator() (__closure=<synthetic pointer>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1580
#41 HPHP::enterVMCustomHandler<HPHP::enterVM(HPHP::ActRec*, Action) [with Action = HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)>::<lambda()>]::<lambda()> > (action=..., ar=0x7fffe137ff90) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1571
#42 HPHP::enterVM<HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)>::<lambda()> > (action=..., ar=0x7fffe137ff90) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1580
#43 HPHP::ExecutionContext::<lambda(HPHP::ActRec*)>::operator() (ar=0x7fffe137ff90, __closure=<synthetic pointer>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1651
#44 HPHP::ExecutionContext::invokeFuncImpl<HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::TypedValue&)>, HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*, HPHP::TypedValue&)>, HPHP::ExecutionContext::invokeFunc(const HPHP::Func*, const HPHP::Variant&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags, bool)::<lambda(HPHP::ActRec*)> > (doEnterVM=..., doInitArgs=..., doStackCheck=..., useWeakTypes=false, invName=0x0, argc=0, cls=0x0, thiz=0x7fffd381f8d0, f=0x0, this=0x7fffe853f010) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1542
#45 HPHP::ExecutionContext::invokeFunc (this=this@entry=0x7fffe853f010, f=0x0, args_=..., thiz=0x7fffd381f8d0, thiz@entry=0x0, cls=cls@entry=0x0, varEnv=varEnv@entry=0x7fffe85448d0, invName=0x0, flags=HPHP::ExecutionContext::InvokePseudoMain, useWeakTypes=false) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:1655
#46 0x0000000000f4768a in HPHP::ExecutionContext::evalPHPDebugger (this=this@entry=0x7fffe853f010, unit=0x7fffd67f9280, frame=<optimized out>, frame@entry=0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:2115
#47 0x0000000000f48021 in HPHP::ExecutionContext::evalPHPDebugger (this=0x7fffe853f010, code=<optimized out>, frame=frame@entry=0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/execution-context.cpp:2044
#48 0x00000000010bd2ea in HPHP::Eval::DebuggerProxy::ExecutePHP (this=this@entry=0x7fffe84e1010, php="<?php Action::factory( 'view', $article, $ctx )->show();;", output=..., frame=0, flags=<optimized out>) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger_proxy.cpp:797
#49 0x00000000010fa615 in HPHP::Eval::CmdEval::onServer (this=0x7fffd3e4ee30, proxy=...) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/cmd/cmd_eval.cpp:84
#50 0x00000000010bfba0 in HPHP::Eval::DebuggerProxy::processInterrupt (this=this@entry=0x7fffe84e1010, cmd=...) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger_proxy.cpp:730
#51 0x00000000010c34b4 in HPHP::Eval::DebuggerProxy::interrupt (this=0x7fffe84e1010, cmd=...) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger_proxy.cpp:340
#52 0x00000000010ce9a8 in HPHP::Eval::Debugger::Interrupt (type=type@entry=0, program=program@entry=0x3173558 <std::string::_Rep::_S_empty_rep_storage+24> "", site=site@entry=0x0, error=error@entry=0x0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger.cpp:261
#53 0x00000000010cee8f in HPHP::Eval::Debugger::InterruptSessionStarted (file=0x3173558 <std::string::_Rep::_S_empty_rep_storage+24> "", error=error@entry=0x0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger.cpp:193
#54 0x00000000010cf100 in HPHP::Eval::Debugger::DebuggerSession (options=..., restart=restart@entry=false) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/debugger/debugger.cpp:153
#55 0x0000000000fe2fac in HPHP::execute_program_impl (argc=argc@entry=3, argv=argv@entry=0x7fffffffe5f8) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/program-functions.cpp:1802
#56 0x0000000000fe50ee in HPHP::execute_program (argc=argc@entry=3, argv=argv@entry=0x7fffffffe5f8) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/program-functions.cpp:1148
#57 0x0000000000b56d6a in main (argc=<optimized out>, argv=0x7fffffffe5f8) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/hhvm/main.cpp:83

It looks like what's going on is similar to T61130: Fatal error: LuaSandboxFunction::call(): PANIC: unprotected error in call to Lua API (not enough memory) in LuaSandbo, the fix for that in rMLUS49a0fe33d2cd: Allow memory over-allocation in unprotected Lua calls doesn't trigger for Lua→PHP→Lua calls. HHVM catches the C++ exception thrown by Lua and turns it into the "unknown exception" PHP exception. I'm working on verifying this and (if verified) writing a patch.

It looks like my supposition is indeed the case. I can reproduce it without MediaWiki or Scribunto involved at all:

$sandbox = new LuaSandbox;
$sandbox->setMemoryLimit( 100000 );
$sandbox->loadString( 'local f = ...; f()' )->call(
    $sandbox->wrapPhpFunction( function () {
        global $sandbox;
        $sandbox->loadString( 'return 0' )->call( str_repeat( 'x', 100001 ) );
    } )
);

In HHVM that gives the "unknown exception" fatal error. With Zend PHP 5.6.30, it segfaults. For some reason 7.0.20 works fine, I have no idea why.

We can partially fix it by changing the check in rMLUS49a0fe33d2cd: Allow memory over-allocation in unprotected Lua calls, testing if the in_lua count is >= the in_php count or adding another flag. But that seems likely to get out of sync after the first memory error, and won't help if some module is passing huge enough values from Lua→PHP to exceed the "slop" added in that patch.

To really fix it, I suppose we'll have to catch Lua errors at the PHP→Lua call boundary in luasandbox_load_helper(), LuaSandbox::callFunction(), LuaSandbox::wrapPhpFunction(), LuaSandboxFunction::call(), LuaSandbox::registerLibrary(), and probably LuaSandboxFunction::dump(). Unfortunately Lua seems to offer only one public method to catch Lua errors in C, lua_cpcall(), which doesn't seem particularly convenient to use. So this will probably be an annoyingly complicated bit of coding. (there is LUAI_TRY in luaconf.h, but it's not documented and seems to need a fair bit of undocumented surrounding code to function).

Anomie claimed this task.Jul 26 2017, 4:51 PM

Change 367935 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/php/luasandbox@master] Catch Lua errors at PHP→Lua call boundaries

https://gerrit.wikimedia.org/r/367935

Due to more incoming reports of categories being broken, I "nowiki"-ized the template so the double inclusion does not happen.

Interesting note: some pages like https://commons.wikimedia.org/wiki/Category:1775_in_Norway continue to refuse to render even after T171392#3484073, and accessing in English.

Change 367935 merged by jenkins-bot:
[mediawiki/php/luasandbox@master] Catch Lua errors at PHP→Lua call boundaries

https://gerrit.wikimedia.org/r/367935

@zhuyifei1999 : I see that this bug blocks solving fallout from T171392 for Commons. The problematic template is "nowikized", but the pages are still broken (from this bug, not from T171392) until cache invalidation is forced? Perhaps Mediawiki-Cache project should be added and contacted. Evidently, the cache invalidation for (rather) all pages with the problematic template is needed, but "normal" ways won't work.

Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 6:04 PM
Ankry added a subscriber: Ankry.EditedAug 25 2017, 6:19 PM

While attempting to purge manually categories listed in Category:Pages_with_script_errors I noticed that some of them render OK in preview, but fail to render while saving. When they render, very high LUA memory usage is reported (~49MB), just below the limit. Eg. Category:1897_in_Turkey sometimes renders OK reporting usage of 48.9MB, sometimes fails to render or reports LUA OOM error and usage of 50MB. Unsure why it is not stable.

Does anybody know why the LUA scripts used here are so much memory consuming? I think they were not some time ago. Did some internationalization changes break them? Another hints?

If retrieving information from Wikidata is so much memory consuming, maybe static templates containing this info should be created instead, and updated regularily by a bot?

Can anybody debug (or at least hint how to do it) why theese templates are so expensive?

Nowiki-ization was a bad hack to reduce the memory consumption. It worked
for a few pages, such as the two in the initial report, but some still have
a memory consumption higher than the threshold, unfortunately.

jeblad removed a subscriber: jeblad.Aug 25 2017, 9:33 PM

Nowiki-ization was a bad hack to reduce the memory consumption. It worked
for a few pages, such as the two in the initial report, but some still have
a memory consumption higher than the threshold, unfortunately.

Yes, but works the hack if the page is still cached in "prehack" version? If the cache renewal for the affected page is succesful, the OOM still persists?

Following a discussion at en:WP:VPT I wrote a very small module (commons:Module:Country) to test the overhead from fetching the Wikidata label for each country in Europe. I used a generic module name in the hope that I might extend it to provide a more efficient alternative to commons:Template:Countries of Europe.

Results from running the module are at commons:Module talk:Country. They show that merely fetching labels from Wikidata gives Lua memory usage: 42.92 MB/50 MB. An elaborate caching system would be required to fix the overhead, but that would be hard due to the large number of possible languages.

It is unlikely the 50 MB memory limit will be lifted in the near future, so some drastic workaround is needed to fix the problems related to Template:Countries_of_Europe at commons:Category:Pages with script errors.

There is a very interesting suggestion in that discussion that should be taken into account: use mw.wikibase.label. I can confirm that using mw.wikibase.label takes basically no memory (504K with the call, 503K without the call) compared to 42M+ when loading the whole page for tens of countries.

Mentioned in SAL (#wikimedia-operations) [2017-08-31T09:10:03Z] <moritzm> installing experimental hhvm-luasandbox package on mw1261 (canary app server) with backported patch for T171392 / T173705

There is a very interesting suggestion in that discussion that should be taken into account: use mw.wikibase.label. I can confirm that using mw.wikibase.label takes basically no memory (504K with the call, 503K without the call) compared to 42M+ when loading the whole page for tens of countries.

T173194 / T173262

I updated Module:Country and believe it now displays the same result as Template:Countries of Europe. See the demonstration at module talk and the discussion at template talk. As far as I know there is no realistic way of comparing outputs from the current template and the sandbox template (which invokes the module) for all possible languages, so replacing the current template would be courageous, but it would avoid the out-of-memory errors.

zhuyifei1999 renamed this task from Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template to Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 due to OOM in Lua→PHP→Lua calls.Sep 3 2017, 3:16 PM
Verdy_p added a subscriber: Verdy_p.Sep 6 2017, 8:17 AM

The Module:Country version is not needed at all (and in fact it is badly named). There was no issue until recently when something got changed in MediaWiki causing unnecessary expansions (notably since MediaWiki 1.28 and later, where lazy/delayed expansion is no longer used as it was before, causing ALL template (or modules) parameters to be expanded even if they are not to be returned and will be unused.
This is another case of regression in MediaWiki 1.28 that caused lot of nightmares (including problems with defered tasks/background workers that no logner work as expected and crash now in lot of wikis):
I don't understand why full expansion of parameters is needed immediately: for Mediawiki templates it should never be necessary. But if you pass parameters to a Lua modules, I wonder how Lua modules will react if they receive parameter values that are still not expanded and will auto-expand on first evaluation.
These memory exhaustion come from this full expansion of parameters. This causes lot of crashes in wikis running on small servers, or heavy load on them (and their incability to support as many coincurrent requests as before or severe degradation of performance with lot of I/O caused by heavy swaps to disk, and many concurrent tasks will crash or too many objects will be flushed too early from caches and will need to be regenerated on next evaluation, this can cause the same parameter which was lazily expanded to be expanded multiple times, because the previous evaluation was lost but its allocated memory was still not garabage collected).
There was no such problems with MW 1.27 and we did not need so many defered jobs and had much smaller job queues. MediaWiki 1.28+ is a failure since the begining, and none of the following versions or patches have solve this problem.

Note that the current "kludge" using "nowiki" to avoid immediate expansion works. It proves that lazy expansion of template or module parameters works! This is the way to go, but we should not need to use such "nowiki"/"unwiki" trick in out templates.

Anomie added a comment.Sep 6 2017, 3:21 PM

There was no issue until recently when something got changed in MediaWiki causing unnecessary expansions (notably since MediaWiki 1.28 and later, where lazy/delayed expansion is no longer used as it was before, causing ALL template (or modules) parameters to be expanded even if they are not to be returned and will be unused.

This is incorrect, and can be tested by passing a unique <ref> tag in an unused parameter. If the parameter is expanded, the reference shows up in the output of <references/>. If the parameter is not expanded, it doesn't show up.

But if you pass parameters to a Lua modules, I wonder how Lua modules will react if they receive parameter values that are still not expanded and will auto-expand on first evaluation.

That's how it already works. A parameter from frame.args isn't expanded (via a call to PPFrame->getArgument() or the like) until you access it, you pass the args table to pairs, or you call frame:preprocess().

That last is a bit weird, but is needed for T47684 and has been the case since MW 1.21, and can be avoided in some cases by using frame:expandTemplate() or frame:callParserFunction() instead.

These memory exhaustion come from this full expansion of parameters. This causes lot of crashes in wikis running on small servers

Considering the MediaWiki parser and Scribunto both apply limits that should be well short of the memory limits of the server, this too seems very unlikely.

With respect to this particular bug, there is not any server memory exhaustion going on at all. The problem is that the configured memory limit for Lua is being reached in a particular way that causes the resulting Lua error to be caught unexpectedly by HHVM rather than by Lua, at which point HHVM raises a fatal error.

Note that the current "kludge" using "nowiki" to avoid immediate expansion works. It proves that lazy expansion of template or module parameters works! This is the way to go, but we should not need to use such "nowiki"/"unwiki" trick in out templates.

That's a problem in Module:Fallback expanding more arguments than necessary, not a problem in MediaWiki.

Note that the current "kludge" using "nowiki" to avoid immediate expansion works. It proves that lazy expansion of template or module parameters works! This is the way to go, but we should not need to use such "nowiki"/"unwiki" trick in out templates.

That's a problem in Module:Fallback expanding more arguments than necessary, not a problem in MediaWiki.

I don't know where Module:Fallback "expands" any argument, it just selects one of them by only testing if one is set or not, but does not invoke directly any expansion on them, their values are returned "as is" to the caller. If this is caused by testing if an arg is not null, and it is null, this wil not expand anything and will not cause more memory to be allocated. If the value is not null, it is the value effectively returned, that will need to be finally expanded.
So if an unnecessary expansion occured, this was even before the Lua module was invoked, and here it can only be done by MediaWiki itself.
But the problem of "expansion" is that it is still a *deep* one instead of being performed mazily on demand.

Anomie added a comment.Sep 6 2017, 5:44 PM

I don't know where Module:Fallback "expands" any argument,

That's clear enough. But just because you don't know it doesn't mean it isn't happening, so your reasoning based on that assumption is faulty.

Glancing at the current version of c:Module:Fallback, I see several argument expansions in the course of a call to the langSwitch method.

  • Line 58 expands the first of 'en', 'default', or 'nocat' that isn't nil.
  • Line 61 expands 'lang'.
  • Line 65 expands all arguments, if that line is reached (i.e. 'lang' isn't provided or is empty).
    • That could be avoided by using frame:callParserFunction( 'int', 'lang' ) instead of frame:preprocess( "{{int:lang}}" ).
  • Line 40 would again expands the first of 'en', 'default', or 'nocat' that isn't nil, but that already happened on line 58.
  • The loop on lines 49–52 expands at least one argument, possibly more depending on what winds up in parselist and how many of those arguments are set to the empty string.

Other externally-callable functions in that module expand arguments too, but there's little point in my going through them all for you.

Jarekt added a comment.Sep 6 2017, 5:52 PM

I can look through c:Module:Fallback and see if I can clean it up. It is one of the earliest modules written on Commons and one of the most used.

My understanding of the situation is:

Reworking c:Module:Fallback would be desirable but that is unlikely to eliminate the underlying issue, namely that {{Countries of Europe}} uses a lot of Lua memory.

Given that results from the template always use the current user's language, why not use mw.wikibase.label as done by the new c:Module:Country?

A comparison can be seen at:

  • sandbox shows {{Countries of Europe|prefix=:Category:}}
  • sandbox2 shows {{Countries of Europe/sandbox|prefix=:Category:}} which uses c:Module:Country

An extract from the NewPP limit report for these pages shows:

NewppTemplateSandboxLimit
Lua time usage:4.1190.049/10.000 seconds
Lua memory usage:44.471.11 MB/50 MB
Legoktm added a subscriber: Legoktm.Sep 7 2017, 5:09 AM

AIUI the fix for this bug is currently being rolled out as part of T173705: HHVM: Unknown exception.

Given that results from the template always use the current user's language, why not use mw.wikibase.label as done by the new c:Module:Country?

int:lang makes a message appear in the user language. I looked at https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua#mw.wikibase.label and according to the history returning it in the user language with fallback was implemented about a year ago. Not sure when it was actually enabled and fully functional on Commons. If this is all working alright now, we can probably get rid of a lot of int:lang cases (cc @Jarekt )

Verdy_p added a comment.EditedSep 7 2017, 9:30 AM

In addition I think the major problem is actually not here but with these two bugs, that are the real cause of excessive memory usage.

T173194: Module:Wikidata label should use mw.wikibase.label and mw.wikibase.sitelink
T173262: Get arbitrary label from Item without loading the full Item (via Lua).

I don't know where Module:Fallback "expands" any argument,

That's clear enough. But just because you don't know it doesn't mean it isn't happening, so your reasoning based on that assumption is faulty.

Glancing at the current version of c:Module:Fallback, I see several argument expansions in the course of a call to the langSwitch method.

  • Line 58 expands the first of 'en', 'default', or 'nocat' that isn't nil.

No, it does not expand anything there. It does only test is they are nil or not nil:

args = mw.getCurrentFrame():getParent().args

only to retrieve the list of arguments as an array (without asking for their expansion).

  • Line 61 expands 'lang'.

No problem here. This is a single string, even if it causes it to expand a "{{int:lang}}" given in parameter, this expansion will occur once per page and will be cached.

  • Line 65 expands all arguments, if that line is reached (i.e. 'lang' isn't provided or is empty).
    • That could be avoided by using frame:callParserFunction( 'int', 'lang' ) instead of frame:preprocess( "{{int:lang}}" ).

I don't see how this can save anything, this should be functionally equivalent and will expand the same amount of text.

  • Line 40 would again expands the first of 'en', 'default', or 'nocat' that isn't nil, but that already happened on line 58.

Same thing, there is not any explicit expansion. If this occurs, this is a problem of Scribunto. Testing if a variable is nil should is not a dereference.

  • The loop on lines 49–52 expands at least one argument, possibly more depending on what winds up in parselist and how many of those arguments are set to the empty string.

Here again this should not have any impact: the loop just expands what is necessary and terminates on the first one that is not nil. Those that are nil should not have any significant memory impact.

Other externally-callable functions in that module expand arguments too, but there's little point in my going through them all for you.

My opinion is that these are bugs of the Scribunto integration layer (not directly Lua), doing the expansion of all arguments too early, even if they are unused and we actaulyl don't need this full expansion (that should be made explicitly if the module does not want to see the unexpanded wikitext given in value).

We need a way to test empty parameters without requiring their value to be fully expanded. Scribunto is doing too much and is not smart enough to delay these things.

Anomie added a comment.Sep 7 2017, 2:33 PM

My understanding of the situation is:

  • HHVM rather than Lua catches the error when the Lua memory limit is hit.
  • That can result in "503, Service Unavailable" responses.

Correct. In particular, HHVM generates a fatal error when it catches the Lua error.

This is not directly related to the 503 issue. But it's an indication of the sort of Lua errors that might be causing the bug in luasandbox to manifest.

In particular, this bug will be closed when the memory errors stop causing the 503 errors, even if the Commons template is still hitting the memory limit. If people want to track the work of improving that template in Phabricator, I'd recommend splitting it off into another task. Or you're welcome to track it on Commons if you'd rather.

  • Line 58 expands the first of 'en', 'default', or 'nocat' that isn't nil.

No, it does not expand anything there. It does only test is they are nil or not nil:

Yes, it does. Your denying the fact does not change it.

args = mw.getCurrentFrame():getParent().args

only to retrieve the list of arguments as an array (without asking for their expansion).

Correct, there's no expansion in the quoted line. But no one ever said there were expansions happening on that line (line 59).

  • Line 61 expands 'lang'.

No problem here.

I never said it was a problem. I was pointing out that it is an expansion of an argument, since you claimed that you didn't know where any argument expansions were occurring.

  • Line 65 expands all arguments, if that line is reached (i.e. 'lang' isn't provided or is empty).
    • That could be avoided by using frame:callParserFunction( 'int', 'lang' ) instead of frame:preprocess( "{{int:lang}}" ).

I don't see how this can save anything, this should be functionally equivalent and will expand the same amount of text.

frame:callParserFunction passes explicit arguments to the parser function without doing any expansion of the arguments itself. frame:preprocess processes arbitrary wikitext in the context of a frame, which means all that frame's arguments are available for use. And as I mentioned earlier, to avoid T47684 that means that all the arguments have to be pre-expanded (or we'd need to do complex things to the frame to be able to catch the expansion of each argument and exempt them from the time usage tracking).

  • Line 40 would again expands the first of 'en', 'default', or 'nocat' that isn't nil, but that already happened on line 58.

Same thing, there is not any explicit expansion. If this occurs, this is a problem of Scribunto. Testing if a variable is nil should is not a dereference.

Lua doesn't give us any way to know if a field is being accessed in order to test for being nil or for falseiness versus being accessed to test for some other purpose. We're not going to be rewriting the internals of Lua to try to change this.

The MediaWiki PPFrame class similarly doesn't give us a way to test if an argument exists in the frame without also expanding it. That could be changed, but is out of scope for this task. Feel free to file a feature request.

I also note the resulting test would wind up looking more like frame:hasArgument( 'en' ) than args.en == nil or not args.en.

  • The loop on lines 49–52 expands at least one argument, possibly more depending on what winds up in parselist and how many of those arguments are set to the empty string.

Here again this should not have any impact:

Here again I was pointing out a place where arguments are expanded, since you claimed to not know of any such places.

and terminates on the first one that is not nil

Not quite. It does not terminate if the argument is the empty string.

My opinion is that these are bugs of the Scribunto integration layer (not directly Lua), doing the expansion of all arguments too early, even if they are unused and we actaulyl don't need this full expansion (that should be made explicitly if the module does not want to see the unexpanded wikitext given in value).

You're welcome to hold incorrect opinions. Continuing to expound on them in this task is not productive, however.

We need a way to test empty parameters without requiring their value to be fully expanded.

As I said above, that's a matter for a separate feature request. Be careful in defining "empty", since testing whether an argument is passed at all, testing whether the argument is set to the empty string, and testing whether the argument is set to something that expands to the empty string are all very different but could all be considered "test empty parameters". If you want to continue that discussion, please file the feature request and take it there.

Verdy_p added a comment.EditedSep 7 2017, 3:22 PM

My understanding of the situation is:

  • HHVM rather than Lua catches the error when the Lua memory limit is hit.
  • That can result in "503, Service Unavailable" responses.

Correct. In particular, HHVM generates a fatal error when it catches the Lua error.

This is not directly related to the 503 issue. But it's an indication of the sort of Lua errors that might be causing the bug in luasandbox to manifest.

In particular, this bug will be closed when the memory errors stop causing the 503 errors, even if the Commons template is still hitting the memory limit. If people want to track the work of improving that template in Phabricator, I'd recommend splitting it off into another task. Or you're welcome to track it on Commons if you'd rather.

  • Line 58 expands the first of 'en', 'default', or 'nocat' that isn't nil.

No, it does not expand anything there. It does only test is they are nil or not nil:

Yes, it does. Your denying the fact does not change it.

args = mw.getCurrentFrame():getParent().args

only to retrieve the list of arguments as an array (without asking for their expansion).

Correct, there's no expansion in the quoted line. But no one ever said there were expansions happening on that line (line 59).

  • Line 61 expands 'lang'.

No problem here.

I never said it was a problem. I was pointing out that it is an expansion of an argument, since you claimed that you didn't know where any argument expansions were occurring.

  • Line 65 expands all arguments, if that line is reached (i.e. 'lang' isn't provided or is empty).
    • That could be avoided by using frame:callParserFunction( 'int', 'lang' ) instead of frame:preprocess( "{{int:lang}}" ).

I don't see how this can save anything, this should be functionally equivalent and will expand the same amount of text.

frame:callParserFunction passes explicit arguments to the parser function without doing any expansion of the arguments itself. frame:preprocess processes arbitrary wikitext in the context of a frame, which means all that frame's arguments are available for use. And as I mentioned earlier, to avoid T47684 that means that all the arguments have to be pre-expanded (or we'd need to do complex things to the frame to be able to catch the expansion of each argument and exempt them from the time usage tracking).

Thanks now you locate the only real problem: it is "frame:preprocess" only that cases the early expansion of all parameters in the parent frame (very strange behavior and probably affecting many Lua modules).

One year ago I wondered why there was no direct access in Lua to the MediaWiki property for the user language directly ln the Lua's "mw:" object. I was replied: not a bug, don't care, not needed, won't fix, not a problem. Here we see this is a real problem: we should not even need to use "int:Lang" to query what is really a Mediawiki's native PHP variable, accessible directly in PHP code by Mediawiki extensions (including Scribunto itself). If it had been present we should just use "mw:userLanguage" instead of trying to expand "{{int:Lang}}" here, and we would not have been impacted by this stupid bug (that was revealed later only because the number of translations in parameters of templates or many more data in Wikidata about country names, was largely increased by more translations now available in many more languages) !

And there's still the problem in Wikidata client that is not able to retrieve only selected properties and loads the full objects (this is the real cause of the problem we see now: we load too many data properties that we then don't use).

In summary there was no bug in LangSwitch itself, except for the way to expand "{{int:Lang}}" which has an unexpected side effect while Scribunto prepares its execution: and this is a design bug of Scribunto itself.

  • Line 40 would again expands the first of 'en', 'default', or 'nocat' that isn't nil, but that already happened on line 58.

Same thing, there is not any explicit expansion. If this occurs, this is a problem of Scribunto. Testing if a variable is nil should is not a dereference.

Lua doesn't give us any way to know if a field is being accessed in order to test for being nil or for falseiness versus being accessed to test for some other purpose. We're not going to be rewriting the internals of Lua to try to change this.

The MediaWiki PPFrame class similarly doesn't give us a way to test if an argument exists in the frame without also expanding it. That could be changed, but is out of scope for this task. Feel free to file a feature request.

I also note the resulting test would wind up looking more like frame:hasArgument( 'en' ) than args.en == nil or not args.en.

  • The loop on lines 49–52 expands at least one argument, possibly more depending on what winds up in parselist and how many of those arguments are set to the empty string.

Here again this should not have any impact:

Here again I was pointing out a place where arguments are expanded, since you claimed to not know of any such places.

You pointed another place than the lines where you claimed it was expanded there, and it is not at this line. When I say "I don't know" it is not on this code, but would have occured before, or within the internals of Lua and its bindings to PHP via Scribunto.

But up to this line, there was still nothing cuasing this undesired effect.

and terminates on the first one that is not nil

Not quite. It does not terminate if the argument is the empty string.

Oh yes, but this does not change the point: if it is an empty string (treating exactly like nil, i.e. the absence of the parameter), there is still no expansion, no memory will be needed for that (except if the unexpanded value was a complex wiki template returning an effective empty string, but this is not the case in this bug).

My opinion is that these are bugs of the Scribunto integration layer (not directly Lua).

I reconfirm it: Scribunto needs correction to avoid the unexpected, undocumented, and costly side effects of frame:preprocess() as much as possible.

Anomie added a comment.Sep 7 2017, 4:00 PM

Thanks now you locate the only real problem:

No, the real problem here is the fact that Lua errors are being caught by HHVM. The memory usage of the widely-used Commons template merely serves to make the issue more obvious.

But yes, use of frame:preprocess is probably a contributor to the template's memory usage by causing expansion of all arguments, even those otherwise unused.

One year ago I wondered why there was no direct access in Lua to the MediaWiki property for the user language directly ln the Lua's "mw:" object. I was replied: not a bug, don't care, not needed, won't fix, not a problem.

I don't know where you wondered that. The only task in Phabricator I'm aware of requesting this T68051, which is blocked on T4085 with the note that it's not likely to happen in Scribunto for the same reasons T4085 has never been done.

Here we see this is a real problem: we should not even need to use "int:Lang" to query what is really a Mediawiki's native PHP variable,

The reasons for it not being available are described in T4085.

And there's still the problem in Wikidata client that is not able to retrieve only selected properties and loads the full objects (this is the real cause of the problem we see now: we load too many data properties that we then don't use).

That's yet another bug. Take it to the appropriate task, or file one if it doesn't exist yet.

Two days ago I cleaned up c:Module:Fallback a bit and it no longer uses frame:preprocess among other improvements. Did it helped?

@Jarekt c:Category:SVG logos of the United Kingdom no longer has a problem, but that is because I implemented the template with a module which uses mw.wikibase.label. See T171392#3587254 above for a table comparing the performance of the old template with the module—the module uses under 0.05 seconds and 1.2 MB. Some notes on the reduction in the number of pages listed at the error category are at template talk.

The main template now invokes the module, and the sandbox has the old template wikitext. Examples are at the sandbox and sandbox2 links in T171392#3587254. I purged the two pages and the old template still needs over 4 seconds and 43 MB.

Verdy_p added a comment.EditedSep 11 2017, 8:31 AM

@Jarekt, your cleanup has changed the list of fallbacks. Commons had more fallbacks than the default in Mediawiki (what you have named now "old fallback list in Commons" in your comments, and reported by the last function present in the modulewhere you've unexpectedly also reordered the implemented functions that were listed in dependency order)

All functions on Commons were based on this "old" fallback list that were expected for many regional languages. : the last function that retrieved the fallback list to use was the first one in source and was used by all other functions in the module, now it is there but no longer used, but it was needed.

That list of "old" fallbacks were reported several years ago to get corrections in MediaWiki's builtin list which had various expected items missing in MediaWiki's default fallbacks, so Commons now no longer displays those items (such as "co->en" instead of expected "co->fr").

And note that now the fallback lists requested can now contain duplicates, such as (en, en).

Looks like the module implementation works (many thanks to @Johnuniq). So this bug can probably be closed (or at least the priority should be lowered).

Nevertheless, a few notes.

First: the same problem may affect other {{Countries of ...}} templates in future (like Template:Countries of Asia) . There are no "langlinks/langswitches" implemented yet, but it's a sister template to Template:Countries of Europe so it's very probable someone will try it the future. Perhaps similar modules like Module:Countries of Europe should be prepared and implemented in advance?

Second: see Category:People of Turkey in the 2010s. The template Template:Countries of Europe is duplicated there, firstly from the category page itself, secondly through the Template:PeopleDecadebyCountry . This should definitely be solved/avoided (but this is out of scope of this task).

Third: it's true that Category:Pages with script errors is almost clean of categories with this error, but I still see that there are still about 5-10 "suspicious" categories that probably need to be purged (above metioned Category:People of Turkey in the 2010s is one of them). Are they newly popped or were they overlooked?

Verdy_p added a comment.EditedSep 11 2017, 8:51 AM

Thanks now you locate the only real problem:

No, the real problem here is the fact that Lua errors are being caught by HHVM.

I disagree, this real problem for the template in this thread was in the memory usage of frame:preprocess(), and it is effectively the first and main bug in Scribunto (which can affect really a lot of modules using that frame:preprocess function).


What you report is the second bug in Scribunto that occurs later because of this one (excessive use of memory), but it could occur for many other unrelated reasons (but you're right, that second bug is critical because it causes the server to fail with HTTP error 500 with uncaught exceptions: the HTTP error is solved, but the error is still reported (but without any usable call traces).

Note that you've changed the initial title of this current bug to reflect you view on the cause but this is not the only problem that occured and only a derived effect.


The third bug is the impossibility to retrieve only selected properties from Wikidata: this is a bug of Wikibase as we only get the full list of properties: this started to bug on Commons because Qnnn objects for languages have widely been incresed in size with many more translations added than what was initially defined. This is critical too because Wikidata is constantly adding new properties on many objects and all modules using its data son't need so many properties and their performance is then progressively decreasing and their memory usage is constantly growing: these other modules could as well report the same kind of bugs we had here for the countries of Europe.

And I'm convinced we are seeing similar reports. It was noted for this template because it was largely used used in Commons. But it could happen in various modules using Wikidata to get their translated labels, or to process data such as lists of administrative divisions, or aggrgating statistics data about countries, or goelocation templates, or author's data templates (e.g. birth dates, death dates, data about current location of pieces of art (museum, room...), or list of books from authors, or just getting an ISBN from a book data item.

So what has changed and caused these bugs to ccour was the huge growth of Wikidata object sizes, the increase of properties and notably the addition of many translated labels (from one or two dozens of languages, now frequently several hundreds, notably in objects related to very frequently referenced items such as country names).

This third bug about wikidata client still needs to be solved (it is now critical) You cannot solve it by just increasing the memory quota for Lua, or even by solving the second bug to catch more cleanly the memory exhaustion error: solving that second bug will still not solve what we saw here and which remained a memory overuse by Wikidata client requests.

A separate question is how to mange the growth of properties everywhere in Wikidata (it is a more generic problem; now Wikidata objects are becoming very huge, this is a data modelisation problem in Wikidata itself and that impacts its usability, we are too far from clean data OO design) and editing any item in Wikidata is now extremely complex for many users that see "giant" pages with too much entries (in addition these entries are in random order, making edits increasingly difficult to locate).


For now the new Module:Country just implements a workaround, it does not solve any of these 3 bugs. And the workaround is not stable (I do not see why we get incoherent befhvort between frame:expandTemplate("namespace","templatename") and frame:preprocess("{{namespace:templatename"), and both expansions should be cached instead oif occuring multiple times.

So the workaround may still fail at any time with anyone of the 3 bugs above.

Strainu removed a subscriber: Strainu.Sep 11 2017, 8:59 AM
Verdy_p added a comment.EditedSep 11 2017, 9:14 AM

If you want a similar kind of errors (memory overuse by Wikidata client in Lua) look at

https://commons.wikimedia.org/wiki/Wiki_Loves_Monuments_2015_winners
https://commons.wikimedia.org/wiki/Wiki_Loves_Monuments_2016_winners

: Something was changed in these pages, it no longer bugs, but we are still very close to the memory limit (47.9MB on 50MB max for the 2015 version, and it is really excessive for what we really need, just the translated label, possibly with fallbacks)

We have the same effect and the same causes (possibly also the use of frame:preprocess() to expand "{{int:lang}}", but also memory overuse in Wikidata clients that just ned selected properties instead of full objects).

Here the Lua error is caused when trying to get translated labels from Wikidata for "Photography", or "Author", or "Licence".
The same requests is also not cached, and repeated over the same page to retrieve exactly the same properties of the same object. This is also a design bug in Wikidata client (this absence of caching may also be another cause of the bug occuring in coutnries of Europe, as we repeatedly use the same requests to expand "{{int:lang}}" multiple times and will repeately expand all unprocessed arguments in frame.

I fear that the absence of cache is still affecting the performance even after your "correction" in language fallbacks implementation. frame:preprocess() is stupidly expanding things we don't need and that should not be done at all (it would avoid expanding multiple times these unneeded parameters in parent frame, but even if it does, caching these should help: we are abusing with repeated wikidata requests for the same objects on the same page).

Caching is what was used when template-only emplementations were used before Wikidata: The same should be done in Wikidata client.

MediaWiki internally uses a cache for template expansions (so expanding "{{int:lang}}" mutliple times would also not cause unnecessary expansions multiple times: frame:preprocess() at least should also cache the expansion of "{{int:lang}}". Mediawiki uses also a cache for all builtin "magic keywords", a cache for every already loaded transcluded pages, a cache for every tested page with #ifexist (really tested only once). This is general.

Note: the updated Fallback module no longer uses the Commons-specific fallback list (at top of the script) it just uses the default MediaWiki fallbacks (which is missing various entries that were tuned progressively for use on Commons). This is a radical unexpected and undiscussed change that impacts regional and minority languages that are now exposed too frequently with undesired English fallbacks.

@Vachovec1 The module I wrote was originally Module:Country because I thought it would be enhanced to cover more than Europe. The module was moved and I explained my plans at c:Module talk:Countries of Europe. In a few days I am likely to have something to demonstrate and we could decide on a better name such as Module:Countries.

I purged all the pages I could see in the tracking category two days ago. Every few hours a couple more pages that need purging pop up. For example, I had to purge c:Category:1876 in Finland 12 hours ago.

Verdy_p added a comment.EditedSep 11 2017, 11:08 AM

Note that one or the three bugs I listed are still present, see this sample:
https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/3

The workround module for countries (only a temporary fix which does not solve really any one of the problems we found here) is not used ; this is still using the Country/list template, and the memory error still occurs because of repeated uncached wikidata queries for country labels each invokation of the template will allocate the same huge amount of memory for the uncached wikidata queries.

: Oh something just changed while posting this, and now instead of Lua error, we get some of the result (but the memory usage is still largely excessive: nearly 40MB on the 50MB limit just for this small list). The list is still incorrect because it is now truncated to only 28 items (up to Ukraine, missing Vatican and further items).
: Apparently there are ongoing works to save memory in Wikidata client module.
: Note also the resources (and time) needed for just the simple transclusion of

10.91%  316.366      1 Template:Germany
 5.33%  154.533      1 Template:Russia
 4.60%  133.309      1 Template:France
 4.16%  120.616      1 Template:Romania
 3.95%  114.482      1 Template:Italy

Germany comes first because its Wikidata labels and properties are twice larger than those for Russia or other countries, but we only need here a single translated label (with fallbacks) and a single property (a category name on Commons); this is clearly a proof of issue in Wikidata client.

JFTR, the patch by @Anomie (https://gerrit.wikimedia.org/r/367935) has now been fully rolled out in production (by means of the 2.0.14 release).

Verdy_p added a comment.EditedSep 13 2017, 10:07 PM

So it solves a part of the problem, but instead of having one kind of error, we have now another error (a bit more informative, but still not solving the memory overuse problem, as seen in https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/3 where there are missing items in the list and too much memory used by wikidata queries for localized labels and localized wikipedia targets).

So there's more work needed for at least two or three other bugs (frame:preprocess doing bad things, no cache in Wikidata client library for repeated queries, and non selective queries for labels or properties in Wikidata, retriving more than wanted: the extra languages don't need to be returned, they can just remain in local caches of the Wikidata SQL server if a selective query does not return something useful and another query is performed for another language)

zhuyifei1999 closed this task as Resolved.Sep 13 2017, 11:05 PM

This bug is about 500s and 503s as result of HHVM should catching OOM in Lua→PHP→Lua calls; this behavior is fixed, hence closing. How Wikidata returns data and how parser works are out of scope of this task. Optimizations to Wikidata label querying are tracked at T173194 / T173262.

It was in scope of this bug when it was first open (but then reduced by altering its title) At that time the internal HHVM behavior was not even understood. what was reported was a crash starting to occur for an unknown cause (when the page was working before).

And I disagree: T173194 / T173262 are not enough (they concentrate only on selective label queries, but not on selective property queries, such as T142903 for getting site links) and many other similar queries for other properties. Nothing has changed for https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/3 which still does not work at all and whose FALSE result gets truncated SILENTLY (here the solution has caught the exception just to discard it, and this is probably worse than having an explicit error) !

Speravir removed a subscriber: Speravir.Sep 14 2017, 2:15 PM