Page MenuHomePhabricator

Remove usage of templates in QuickSurveys: Muhogan approach in RelatedArticles and QuickSurveys is error prone
Closed, ResolvedPublic5 Estimated Story Points

Description

Our client side error counting spiked from 3k errors a second to 10k due a regression likely caused by T205128. This broke RelatedArticles for some users.
The problem was likely due to the code in mediawiki.template.hogan loading before mobile.startup (which contained a missing webpack runtime module).

While this problem should not happen again it shed some light on the ResourceLoaderMuHoganModule that lives in RelatedArticles and QuickSurveys as being brittle. In fact, in Minerva desktop this code can lead to both Mustache and Hogan being loaded in the client.

original bug report

Exception in module-execute in module mediawiki.template.muhogan:
load.php?debug=false&lang=en&modules=startup&only=scripts&skin=minerva&target=mobile:2 Error: Unknown compiler hogan
The_Church_of_Jesus_Christ_of_Latter-day_Saints_in_Alabama:153

I'm seeing this error when I click Special:Random (and land on a lesser used page)
On refreshing, I do not see the error.
I can only reproduce this error on iOS Safari 12.0 on old pages, which makes me think this is a cache problem.

The error triggers statsv, so I think could be contributing to this nasty spike:

Screen Shot 2018-11-02 at 10.03.14 AM.png (581×846 px, 51 KB)

mw.loader.getState('mediawiki.template.hogan')
"ready"
mw.loader.getState('mediawiki.template.muhogan')
"error"

Screen Shot 2018-11-02 at 9.57.26 AM.png (633×1 px, 225 KB)

Developer notes

I feel like we should stop using templates in this repo.

Acceptance criteria

  • ResourceLoaderMuHoganModule should not rely on the targets system to load either mediawiki.template.mustache or mediawiki.template.hogan.

Event Timeline

Jdlrobson renamed this task from MinervaWebClientError: Unknown compiler hogan showing up in iOS Safari to Regression MinervaWebClientError: Unknown compiler hogan showing up in iOS Safari.Nov 2 2018, 5:17 PM
Jdlrobson triaged this task as High priority.
Jdlrobson added a project: Regression.
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)

@Jdlrobson When you pinged me about this (before the task existed) I found a couple of things that might be useful. It's inconclusive though, but it might help get you closer to the culprit.

Using the same browser configuration and browsing around the wiki, I was also able to reproduce this. For me it happened at https://en.m.wikipedia.org/wiki/Cathedral_glass. And I can consistently reproduce it there now.

For convenience, I also tried reproducing it on desktop and while it doesn't reproduce in Chrome (macOS), it does reproduce on regular desktop Safari 12.0 (macOS 10.13) as long as "Developer > User Agent > Safari - iPhone" is set. (In all cases, using the m-dot domain in the url.)

Which means it probably isn't specific to how iOS or Mobile Safari behave, but rather might be specific to how our code behaves when given a User-Agent string. Maybe that'll help you find the underlying cause?

Artefacts

Below are a couple of artefacts I captured that might help with debugging. The following modules queued from the cached HTML in Varnish for the above url:

html/head/modules
RLPAGEMODULES=["mediawiki.page.startup","mediawiki.user","skins.minerva.watchstar","skins.minerva.editor","skins.minerva.options","skins.minerva.scripts.top","skins.minerva.scripts","skins.minerva.toggling","ext.gadget.switcher","ext.centralauth.centralautologin","ext.visualEditor.targetLoader","mobile.site","mobile.init","ext.relatedArticles.readMore.bootstrap","ext.eventLogging.subscriber","ext.wikimediaEvents","ext.navigationTiming","ext.centralNotice.geoIP","ext.centralNotice.startUp"];mw.loader.load(RLPAGEMODULES);});

Parser cache timestamp, and web server that originally populated it last month:

html/body/parseroutput
"cachereport":{"origin":"mw1327","timestamp":"20181014231805","ttl":1900800,"transientcontent":false}

Web server that served it in the last 24 hours and Varnish layers that cached and re-used it since:

http/Response headers
server: mw1265.eqiad.wmnet
x-cache: cp1083 hit/1, cp3040 hit/4, cp3033 miss
x-cache-status: hit-local
Duplicate module

This might be a red-herring, but I noticed part of the mediawiki.template.muhogan code might not be working as intended.

  1. The QuickSurveys and RelatedArticles extensions both specify a module called mediawiki.template.muhogan in extension.json. The module registry is an array in PHP, which means only one of these will exist. I don't know which one wins in production, but they have logically identical contents so the error can come from either, and both will need to be patched in a similar way. (Note that which one wins may differ between prod, beta and local dev.)
  2. The implementation from RelatedArticles (source) is actually slightly different in that it has a bit of extra logic that tries to obtain the compiler from QuickSurveys first. It falls back gracefully, but per point 1, the extra bit is redundant because the two modules are mutually exclusive.
Next steps

What I've got so far is that:

  • mediawiki.template.muhogan has a declared dependency on either mustache or hogan. This is implemented server-side in ResourceLoaderMuHoganModule::getDependencies by varying the return value based whether target === 'mobile'. I've confirmed that this works and makes its way to the browser without issue, by inspecting mw.loader.moduleRegistry['mediawiki.template.muhogan'] in Safari on the problematic page and I see it does include dependencies:["mediawiki.template.hogan"].
  • By the time its script executes, one or those modules will always be "ready" – given ResourceLoader dependency resolution.
  • Somehow, when the user-agent is Mobile Safari, neither is present via mw.template.getCompiler despite one of two modules having reached state "ready".

Based on this I would conclude that:

  1. mediawiki.template.hogan is failing to (consistently) register the hogan compiler as part of its initial script execution. Rather, it sometimes happens only later based on some external factor.
  2. This external factor seems to vary its behaviour based on the user agent.

I haven't found the external factor, but I have confirmed the first conclusion. Inspecting mw.loader.moduleRegistry['mediawiki.template.muhogan'].script shows that its implementation contains the following:

MobileFrontend/mediawiki.template.hogan
this[void 0]=(window.webpackJsonp=window.webpackJsonp||[]).push([[2],{/* ....... */ 52:function(n,t,r){var e=r(53);mw.template.registerCompiler("hogan",e)},53:function(n,t,r){var e=r(14);n.exports={compile:function(n){return e.compile(n)}}}},[[52,0]]]);

Which means that implementing "mediawiki.template.hogan" basically just consists of the following two statements:

window.webpackJsonp = [];
window.webpackJsonp.push( ... );

The transported code that calls mw.template.registerCompiler() is not executed! (It is only stored as a function inside an array.)

Presumably there is another piece of code somewhere that defines the real webpackJsonp and calls the pushed functions. However, that code is not a dependency of mediawiki.template.hogan.

Presumably there is yet another piece of code that, on desktop UAs and on most pages for mobile UAs, happens to trigger this earlier.

I'll leave it to you to investigate further. Good luck :)

Mmm interesting.. I wonder if https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/469153/ will take care of this when it gets deployed...

^^ Yeah my bet is that the webpack runtime code isn't being loaded in time (which I think that patch will address)

In which case this might get fixed in 1.33.0-wmf.2 (and we'll know Monday)

Jdlrobson lowered the priority of this task from High to Medium.Nov 5 2018, 11:00 PM

New train rolled out with T208549 becoming unblocked. Already i am seeing a drop in errors so looks like https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/469153/ was likely cause.

Jdlrobson renamed this task from Regression MinervaWebClientError: Unknown compiler hogan showing up in iOS Safari to Muhogan approach in RelatedArticles and QuickSurveys is error prone - can cause error spikes.Nov 5 2018, 11:02 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson renamed this task from Muhogan approach in RelatedArticles and QuickSurveys is error prone - can cause error spikes to Muhogan approach in RelatedArticles and QuickSurveys is error prone.Nov 6 2018, 1:29 AM
Jdlrobson lowered the priority of this task from Medium to Low.
Jdlrobson updated the task description. (Show Details)

Looking into Timo's description in T208605#4716965 I can see clearly now that load order is defined by ResourceLoaderMuHoganModule by abusing the targets system (loading a dependency based on the target).
One thing that is interesting to learn is that Minerva desktop is loading both Mustache and Hogan so we'll probably want to address this problem before dismantling that system.

mw.loader.getState('mediawiki.template.hogan')
"ready"
mw.loader.getState('mediawiki.template.mustache')
"ready"

Thus, the issue was happening due to a failure in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/469153/ to load the Hogan compiler.
I think we're safe from this bug showing its face again, and there's not much we can do here, other than find a way to remove the need for ResourceLoaderMuHoganModule (ideally by standardising on one template library across desktop and mobile).

There are 2 templates:

<strong>{{question}}</strong>
<p>{{description}}</p>
<div class="survey-button-container"></div>
<div class="survey-footer">{{{footer}}}</div>
<strong>{{finalHeading}}</strong>
<div class="survey-footer">{{{footer}}}</div>

Switching these to jQuery would be pretty trivial.

Jdlrobson renamed this task from Muhogan approach in RelatedArticles and QuickSurveys is error prone to Remove usage of templates in QuickSurveys: Muhogan approach in RelatedArticles and QuickSurveys is error prone.Feb 26 2019, 5:35 PM
Jdlrobson raised the priority of this task from Low to Medium.
Jdlrobson set the point value for this task to 5.Feb 27 2019, 5:04 PM
Jdlrobson triaged this task as Medium priority.Mar 8 2019, 9:35 PM

Change 495903 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[mediawiki/extensions/QuickSurveys@master] Removes Muhogan module & templates in favor of jQuery

https://gerrit.wikimedia.org/r/495903

Change 495903 merged by jenkins-bot:
[mediawiki/extensions/QuickSurveys@master] Removes Muhogan module & templates in favor of jQuery

https://gerrit.wikimedia.org/r/495903

Jdlrobson added a subscriber: Jdrewniak.

I'm skipping QA for this one, as QA involves verifying the surveys are displaying and we have 3 tasks already in QA which will make sure that is true. Since I merged the code I'm looking for a developer that's not be to sign this off.

pmiazga updated the task description. (Show Details)