ResourceLoader seems to sometimes not see Gadget modules (due to mw1151)
Closed, ResolvedPublic

Details

Reference
bz65424
bzimport set Reference to bz65424.
He7d3r created this task.May 16 2014, 10:47 PM

content hidden as private in Bugzilla

Change 133871 had a related patch set uploaded by Krinkle:
[wmf debug] resourceloader: Output servedBy when load.php has an error

https://gerrit.wikimedia.org/r/133871

content hidden as private in Bugzilla

[23:19] <greg-g> now I just need to figure out what is causing: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Skin_and_gadget_issues_16_May_2014
[23:24] <MatmaRex> huh, i reproduced it
[23:24] <MatmaRex> and looking at this URL: https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.echo.badge%7Cext.gadget.BugStatusUpdate%2CDRN-wizard%2CReferenceTooltips%2CWatchlistChangesBold%2Ccharinsert%2Cedittop%2CmySandbox%2CrefToolbar%2Csearch-new-tab%2Cteahouse%7Cext.geshi.language.css%2Chtml4strict%2Cjavascript%2Ctext%7Cext.geshi.local%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cmw.PopUpMediaTransform%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector&*
[23:24] <MatmaRex> i see a debug comment
[23:24] <MatmaRex> Problematic modules: {"ext.gadget.BugStatusUpdate":"missing","ext.gadget.DRN-wizard":"missing","ext.gadget.ReferenceTooltips":"missing","ext.gadget.WatchlistChangesBold":"missing","ext.gadget.charinsert":"missing","ext.gadget.edittop":"missing","ext.gadget.mySandbox":"missing","ext.gadget.refToolbar":"missing","ext.gadget.search-new-tab":"missing","ext.gadget.teahouse":"missing"}
[23:24] <MatmaRex> so, gadgets have magically disappeared

[23:26] <ori> they're in the startup module
[23:26] <ori> mw.loader.getState('ext.gadget.teahouse')
[23:26] <ori> > "ready"
[23:27] <ori> nothing missing for me
[23:27] <ori> in that url
[23:28] <MatmaRex> ori: took a few page refreshes for me
[23:28] <ori> oh yeah
[23:28] <MatmaRex> i refreshed that URL now and it loaded right
[23:28] <ori> i got it now

[00:49] <greg-g> ori: MatmaRex odder can still repro
[00:50] greg-g just asked Krinkle to take a look now that he's back online
[00:51] <MatmaRex> yeah, i can reproduce still too
[00:52] <MatmaRex> seems to happen randomly, like 20% of time when i load an uncached URL
[00:53] <MatmaRex> all gadget modules are "missing"

[01:32] <MatmaRex> Krinkle: unless you have better ideas, i'd check if Gadget::loadStructuredList is sometimes returning null when it shouldn't be, and if yes, why is it doing that
[01:33] <Krinkle> MatmaRex: yeah, I'm mw-evalling now
[01:35] <Krinkle> Gadget::loadStructuredList() and the underlying memcached object is fine
[01:35] <Krinkle> at least not critical, let me inspect it

The summary as of right now is that Gadgets' generated modules are sometimes missing in load.php (and thus gadgets are not loaded) and we don't know why, but top people are on it. :P

It doesn't seem to affect any other modules nor any of Gadget's UI (like the special page or preferences).

Change 133871 merged by jenkins-bot:
[wmf debug] resourceloader: Output servedBy when load.php has an error

https://gerrit.wikimedia.org/r/133871

Of the 4 application servers for bits in eqiad (mw1149, mw1150, mw1151, mw1152), I've identified mw1151 as the problematic one.

Using requests like the following https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.echo.badge%7Cext.gadget.BugStatusUpdate%2CDRN-wizard%2CReferenceTooltips%2CWatchlistChangesBold%2Ccharinsert%2Cedittop%2CmySandbox%2CrefToolbar%2Csearch-new-tab%2Cteahouse%7Cext.geshi.language.css%2Chtml4strict%2Cjavascript%2Ctext%7Cext.geshi.local%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cmw.PopUpMediaTransform%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector&*bust123

(keep changing the bust query to make different cache misses).

Ori noticed mw1151 has as CPU spike in ganglia and disk issues.

I confirmed via mwscript on the local apache that its memcached is unable to retreive or store a value for cache key 'enwiki:gadgets-definition:7', from Gadget::loadStructuredList()

We should depool that node and have ops look into it.

ori added a comment.May 17 2014, 1:19 AM

Depooled. There is no indication that this was caused by a software fault, so I'm closing this bug as resolved. Once we have the full story on what happened to that server, a postmortem will be posted to https://wikitech.wikimedia.org/wiki/Incident_documentation.

He7d3r added a comment.Jun 4 2014, 5:50 PM

Thanks! :-)

Add Comment