Page MenuHomePhabricator

mw.centralNotice CRASH on client side
Closed, ResolvedPublic1 Estimated Story Points

Description

Since a couple of days, perhaps last week, I am bothered since my console is flooded with long streams of error messages.

The error is not persistent, but happens on every fifth page request, no matter which page is accessed for which action.

It is obviously result of a race condition. I failed to reproduce it with &debug=true, probably since time flow remedies the problem, e.g. slows down.

Clearly there is bad synchronisation, missing declaration of dependencies or lack of waiting for event completion since sub-unit ext.centralNotice.choiceData is already executed before main application component mw.centralNotice has been created.

rECNO/CentralNotice.modules.php tells me on choiceData that dependencies are set dynamically based on context which is obviously doing a bad job, or forgot to wait until initialization of entire application has been completed.

The first executable code within ext.centralNotice.choiceData reads as follows:

mw.loader.implement( 'ext.centralNotice.choiceData',
                     function($,jQuery) {
      mw.centralNotice.choiceData = [{"name":"WikipediaToTheMoon", ...
  • Nowhere above a definition of mw.centralNotice by any other submodule has been made in this package.
This cannot work. If that package happens to be loaded and executed before anybody else has created mw.centralNotice your stuff must fail.
  • The client is free to execute module packages in any sequence.
  • Just before this code an ext.centralNotice.bannerController has been implemented.
    • Recently this has been declared as legacy and is empty now, but might have served its purpose a while ago.

rECNO/subscribing/ext.centralNotice.startUp.js

  • has dependency ext.centralNotice.choiceData
  • does not define mw.centralNotice but is relying that someone else did it before. As a matter of fact, since dependant of choiceData and choiceData is already populating the object this assumption is okay.

rECNO/resources/subscribing/ext.centralNotice.kvStoreMaintenance.js tells me wisely:

  • Don't assume mw.centralNotice has or hasn't been initialized

Traceback

Exception in module-execute in module ext.centralNotice.choiceData: load.php:178:216
"TypeError: mw.centralNotice is undefined"
Stack-Trace:
@https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=Spinner%7Cext.centralNotice.bannerController%2CchoiceData%2CgeoIP%2CkvStoreMaintenance%2CstartUp%7Cext.centralauth.centralautologin.clearcookie%7Cext.uls.init%2Cinterface%2Cpreferences%2Cwebfonts%7Cext.visualEditor.desktopArticleTarget.init%7Cext.visualEditor.supportCheck%2Ctrack%2Cve%7Cjquery.byteLength%2Ccookie%2CembedPlayer%2CloadingSpinner%2CmwEmbedUtil%2CtabIndex%2Cthrottle-debounce%2CtriggerQueueCallback%7Cmediawiki.Title%2CUri%2Capi%2Ccldr%2Ccookie%2CjqueryMsg%2Clanguage%2Ctemplate%2Cuser%7Cmediawiki.api.options%2Cuser%7Cmediawiki.language.data%2Cinit%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.startup%7Cmediawiki.special.changeslist.visitedstatus%7Cmediawiki.template.regexp%7Cmw.EmbedPlayer.loader%7Cmw.MediaWikiPlayer.loader%7Cmw.MwEmbedSupport%2CPopUpMediaTransform%7Cmw.MwEmbedSupport.style%7Cmw.PopUpMediaTransform.styles%7Cmw.TimedText.loader%7Cskins.vector.js%7Cuser.defaults&skin=vector&version=028be523f503:5:1009

...

@https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=Spinner%7Cext.centralNotice.bannerController%2CchoiceData%2CgeoIP%2CkvStoreMaintenance%2CstartUp%7Cext.centralauth.centralautologin.clearcookie%7Cext.uls.init%2Cinterface%2Cpreferences%2Cwebfonts%7Cext.visualEditor.desktopArticleTarget.init%7Cext.visualEditor.supportCheck%2Ctrack%2Cve%7Cjquery.byteLength%2Ccookie%2CembedPlayer%2CloadingSpinner%2CmwEmbedUtil%2CtabIndex%2Cthrottle-debounce%2CtriggerQueueCallback%7Cmediawiki.Title%2CUri%2Capi%2Ccldr%2Ccookie%2CjqueryMsg%2Clanguage%2Ctemplate%2Cuser%7Cmediawiki.api.options%2Cuser%7Cmediawiki.language.data%2Cinit%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.startup%7Cmediawiki.special.changeslist.visitedstatus%7Cmediawiki.template.regexp%7Cmw.EmbedPlayer.loader%7Cmw.MediaWikiPlayer.loader%7Cmw.MwEmbedSupport%2CPopUpMediaTransform%7Cmw.MwEmbedSupport.style%7Cmw.PopUpMediaTransform.styles%7Cmw.TimedText.loader%7Cskins.vector.js%7Cuser.defaults&skin=vector&version=028be523f503:5:477
load.php:178:254

Event Timeline

Hi @PerfektesChaos! Thanks so much for your detailed bug report!

A few questions:

  • When did this start happening?
  • Is it still happening?
  • Can you provide some details of which country/languages/projects/pages it happened on?
  • Can you tell us your device, browser and OS version?

Thanks!!

The first executable code within ext.centralNotice.choiceData reads as follows: [...] Nowhere above a definition of mw.centralNotice by any other submodule has been made in this package.

The dependencies of choiceData are set server-side. Whenever the server can tell that a user might see at least one banner, it adds ext.centralNotice.display as a dependency, and that defines mw.centralNotice (CNChoiceDataResourceLoaderModule.php, line 170). So, ahead of choiceData, ext.centralNotice.display.js should always be executed (and that ensures mw.centralNotice... see line 573 and below).

Just before this code an ext.centralNotice.bannerController has been implemented. [...] Recently this has been declared as legacy and is empty now, but might have served its purpose a while ago.

Currently bannerController is just bringing in ext.centralNotice.startUp and ext.centralNotice.geoIP as dependencies.

So... that's how it's supposed to work. It's true that we could make choiceData also ensure the existence of mw.centralNotice, even when mw.centralNotice.display already should have, but I'm worried that that might end up masking some other serious error. This part of CentralNotice hasn't changed recently. Maybe something changed with ResourceLoader? @Krinkle?

TL;DR

If the getDependencies() method at CNChoiceDataResourceLoaderModule.php would have been accessed everything would have been fine.

Questionnaire

  • When did this start happening?
    • As the first words of task description read, Since a couple of days, perhaps last week.
  • Is it still happening?
    • I am onwiki now for one hour; currently silent, but there have been intermittent calm periods.
  • Can you provide some details of which country/languages/projects/pages it happened on?
    • language: de as stack trace URL tells.
    • country: DE
    • project: wikipedia as stack trace URL tells.
    • pages: various; no pattern; as the second sentence of task description discloses.
  • Can you tell us your device, browser and OS version?
    • device: desktop
    • browser: Firefox, some recent versions.
    • OS: Both MSWIn and Linux.

This is not a browser issue. The JS delivered by server behaves exactly as expected.

Delivered code

(formatted from &debug=false; URL as above)

mw.loader.implement('ext.centralNotice.bannerController', function ($, jQuery) {
});
mw.loader.implement('ext.centralNotice.choiceData', function ($, jQuery) {
  mw.centralNotice.choiceData = [
    {
      'name': 'WikipediaToTheMoon',
      'start': 1461249000,
      'end': 1461459600,
      'preferred': 1,
      'throttle': 100,
      'bucket_count': 1,
      'geotargeted': !1,
      'banners': [
        {
          'name': 'WikipediaToTheMoonBanner',
          'bucket': 0,
          'weight': 25,
          'category': 'WikipediaToTheMoon',
          'display_anon': !1,
          'display_account': !0,
          'devices': [
            'android',
            'desktop',
            'ipad',
            'iphone',
            'unknown'
          ]
        }
      ],
      'mixins': {
        'impressionDiet': {
          'cookieName': 'WikipediaToTheMoon',
          'maximumSeen': 15,
          'restartCycleDelay': 86400,
          'skipInitial': 0
        }
      }
    }
  ];
});

Remark

In general it is a safe way to provide a fallback object generation, as

  • ext.centralNotice.display.js does at line 569 in luxury style
    • mistrusting people would ask for !=='object' (same price, but even more safe, if no function expected)
  • kvStoreMaintenance.js rough but heartfully

This makes code more robust in case of stormy wheather.

Questionnaire addendum

  • Is it still happening?
    • currently silent, but there have been intermittent calm periods
    • Wed, Apr 27, 5:52 PM * Ooops, I did it again.

K, I just got the error going to https://de.wikipedia.org/wiki/Wikipedia:Hauptseite. (I did try there a few times before but hadn't seen anything...)

In theory, you should be getting ext.centralNotice.display code one way or another (either via load.php or from the LocalStorage module cache).

I'll make a patch with the obvious fix (in CNChoiceDataResourceLoaderModule.php) and we can investigate why the display module isn't loading or running when it should later.

Thanks again!!!

AndyRussG triaged this task as Unbreak Now! priority.Apr 27 2016, 4:24 PM
AndyRussG set the point value for this task to 1.

Change 285671 had a related patch set uploaded (by AndyRussG):
choiceData always checks for mw.centralNotice object

https://gerrit.wikimedia.org/r/285671

Change 285671 merged by jenkins-bot:
choiceData always checks for mw.centralNotice object

https://gerrit.wikimedia.org/r/285671

Change 285674 had a related patch set uploaded (by AndyRussG):
choiceData always checks for mw.centralNotice object

https://gerrit.wikimedia.org/r/285674

Looking more closely, it seems the ext.centralNotice.display RL module isn't being loaded at all. So the above patch probably won't help, since ext.centralNotice.startUp.js will throw exceptions when it doesn't find the methods that ext.centralNotice.display.js provides.

The problem is for some reason under certain circumstances the ResourceLoader module gets registered with no dependencies.' I'd abandon all assumptions and get to the bottom of why that is.

mw.loader.moduleRegistry['ext.centralNotice.choiceData'].dependencies = []; on dewiki where the issue is but is a list of 4 elements on enwiki.

Thanks so much @Jdlrobson!!! Yeah, so far it looks like the problem is that contents of choiceData and the dependencies do not correspond as they should. (Should have the dependencies when there is stuff in choiceData, none when there is not.) I'm actually more concerned right now about the content of choiceData, since the Moon campaign is over. No one should be getting that campaign right now.

Strangely, I'm getting different choiceData content from the following two URLs. The first shows an empty choiceData, as expected, but the second brings in the moon campaign info... even though the version URL param is the same...

https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=ext.centralNotice.choiceData&skin=vector&version=838d3c9ab63d

https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=Spinner|ext.centralNotice.bannerController%2CchoiceData%2CgeoIP%2CkvStoreMaintenance%2CstartUp|ext.centralauth.centralautologin|ext.uls.init%2Cinterface%2Cpreferences%2Cwebfonts|ext.visualEditor.desktopArticleTarget.init|ext.visualEditor.supportCheck%2Ctrack%2Cve|jquery.byteLength%2Ccookie%2CembedPlayer%2CloadingSpinner%2CmwEmbedUtil%2CtabIndex%2Cthrottle-debounce%2CtriggerQueueCallback|mediawiki.Title%2CUri%2Capi%2Ccldr%2Ccookie%2CjqueryMsg%2Clanguage%2Ctemplate%2Cuser|mediawiki.api.options%2Cuser|mediawiki.language.data%2Cinit|mediawiki.libs.pluralruleparser|mediawiki.page.startup|mediawiki.template.regexp|mw.EmbedPlayer.loader|mw.MediaWikiPlayer.loader|mw.MwEmbedSupport%2CPopUpMediaTransform|mw.MwEmbedSupport.style|mw.PopUpMediaTransform.styles|mw.TimedText.loader|skins.vector.js|user.defaults&skin=vector&version=838d3c9ab63d

Here is a Curl that seems to consistently get the wrongly cached choiceData:

curl 'https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=Spinner%7Cext.centralNotice.bannerController%2CchoiceData%2CgeoIP%2CkvStoreMaintenance%2CstartUp%7Cext.centralauth.centralautologin%7Cext.uls.init%2Cinterface%2Cpreferences%2Cwebfonts%7Cext.visualEditor.desktopArticleTarget.init%7Cext.visualEditor.supportCheck%2Ctrack%2Cve%7Cjquery.byteLength%2Ccookie%2CembedPlayer%2CloadingSpinner%2CmwEmbedUtil%2CtabIndex%2Cthrottle-debounce%2CtriggerQueueCallback%7Cmediawiki.Title%2CUri%2Capi%2Ccldr%2Ccookie%2CjqueryMsg%2Clanguage%2Ctemplate%2Cuser%7Cmediawiki.api.options%2Cuser%7Cmediawiki.language.data%2Cinit%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.startup%7Cmediawiki.template.regexp%7Cmmv.head%7Cmw.EmbedPlayer.loader%7Cmw.MediaWikiPlayer.loader%7Cmw.MwEmbedSupport%2CPopUpMediaTransform%7Cmw.MwEmbedSupport.style%7Cmw.PopUpMediaTransform.styles%7Cmw.TimedText.loader%7Cskins.vector.js%7Cuser.defaults&skin=vector&version=f03c8171a7f3' -H 'Host: de.wikipedia.org' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: https://de.wikipedia.org/wiki/Wikipedia:Hauptseite' -H 'Cookie: GeoIP=CA:QC:Montr_al:45.46:-73.63:v4; CP=H2; WMF-Last-Access=27-Apr-2016' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'

Just add a | grep WikipediaToTheMoon to the end of it to quickly check for the stale campaign.

However, if I just request ext.centralNotice.choiceData module, I don't get the stale campaign:

curl 'https://de.wikipedia.org/w/load.php?debug=false&lang=de&modules=ext.centralNotice.choiceData&skin=vector&version=f03c8171a7f3' -H 'Host: de.wikipedia.org' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: https://de.wikipedia.org/wiki/Wikipedia:Hauptseite' -H 'Cookie: GeoIP=CA:QC:Montr_al:45.46:-73.63:v4; CP=H2; WMF-Last-Access=27-Apr-2016' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'

On IRC... @Catrope explained this was probably due to a race condition that can cause stale ResouceLoader content to be served, and @BBlack purged the required parts of the cache. Thanks!!! The problem seems to be solved now--@PerfektesChaos, how do things look on your end?

how do things look on your end?

  • Thanks a lot; unchanged (frequent errors).
  • It is Thursday, early afternoon.
  • On Thursday evening, we receive recent deployments; no idea, whether deployment might change, and whether included.
If centralNotice breaks for any reason, all subsequent code in package is not executed.
  • Therefore centralNotice needs to be implemented rather robust and failsafe.
  • Nobody will complain that an advertisement is not visible for two days, but when you are reader of some content or at least author and JS environment does not work you have a problem.
  • There are many configuration changes in CN; another campaign, another country, another banner design, and these are many reasons why centralNotice happen to cause errors. This needs to be taken in account, not relying that always all changes will work everywhere since it fits now for the WMF editor. Changes in campaigns are frequent, sometimes daily, but obviously not thoroughly tested for a week under all conditions like other code.
  • For seven years now I am used that Wikimedia banners and centralNotice are broken, crashing pages on a rather regular base. Especially every autumn again for several months. Mostly it gets remedied after some days, therefore I did not react immediately when the problem arrived last week.

Failsafe and robust implementation

The problem is for some reason under certain circumstances

Yeah, SNAFU, and has to be anticipated.

WRT CNChoiceDataResourceLoaderModule.php there are two ways to improve:

  1. Line 119, you provide a fallback creation, but only if no choices.

Return string should always start with this fallback creation, and then add either an empty Array or Array contents.

  1. A less meaningful approach is at line 146 to preset $dependencies with 'ext.centralNotice.display' under all conditions.

That would avoid execution of ext.centralNotice.choiceData and prevents from recent error situation, but causes always a ResourceLoader failure message if display is not supposed to be loaded.

>how do things look on your end?
Thanks a lot; unchanged (frequent errors).

Ohnoes! Hmmm, maybe the cache purge missed some URLs? Can you please do the following:

  • Open the network tab in the developer tools of your browser.
  • On a pageview when the error occurs, right-click on all the requests to load.php, select "copy as cURL" and paste them into a text file.
  • Redact any personally identifying information from the text file that you do not wish to share (such as user name, user agent details, GeoIP cookie details).
  • Post the text file with the redacted cRULs here, or if you prefer, send them by e-mail to fr-tech at wikimedia.org.

Thanks!!

P.S. I'll try to respond some other issues you brought up a little later...

Change 286119 had a related patch set uploaded (by AndyRussG):
Ensure consistent ordering of choiceData for RL module hashes

https://gerrit.wikimedia.org/r/286119

Change 286126 had a related patch set uploaded (by AndyRussG):
Ensure consistent ordering of choiceData for RL module hashes

https://gerrit.wikimedia.org/r/286126

Change 286119 abandoned by AndyRussG:
Ensure consistent ordering of choiceData for RL module hashes

Reason:
Superseded by I13ae9c696

https://gerrit.wikimedia.org/r/286119

Change 286126 merged by jenkins-bot:
Ensure consistent ordering of choiceData for RL module hashes

https://gerrit.wikimedia.org/r/286126

Change 286183 had a related patch set uploaded (by Jforrester):
Ensure consistent ordering of choiceData for RL module hashes

https://gerrit.wikimedia.org/r/286183

Change 286183 merged by jenkins-bot:
Ensure consistent ordering of choiceData for RL module hashes

https://gerrit.wikimedia.org/r/286183

Hi!

Here's a separate task to make CN more robust in the case of this specific error: T134310.

Are you still seeing this specific problem?

Below are some replies to your comments.

If centralNotice breaks for any reason, all subsequent code in package is not executed.
  • Therefore centralNotice needs to be implemented rather robust and failsafe.

Yes, this is definitely true. In our favour, I'd note that we'd never seen this specific error before, and that it was due to an unusual condition that's being addressed (see T117587).

  • Nobody will complain that an advertisement is not visible for two days, but when you are reader of some content or at least author and JS environment does not work you have a problem.

FWIW, CentralNotice banners are an essential medium for community announcements, various sorts of editor campaigns, and fundraising campaigns.

  • There are many configuration changes in CN; another campaign, another country, another banner design, and these are many reasons why centralNotice happen to cause errors. This needs to be taken in account, not relying that always all changes will work everywhere since it fits now for the WMF editor. Changes in campaigns are frequent, sometimes daily, but obviously not thoroughly tested for a week under all conditions like other code.

We're been working to move the JS that's used in banners into Gerrit and our code review process, and have made progress on that over the past year. The JS that's included in Fundraising banners is pretty stable.

Any contributions you'd like to make to QA of banner content would be greatly appreciated!! :)

  • For seven years now I am used that Wikimedia banners and centralNotice are broken, crashing pages on a rather regular base. Especially every autumn again for several months. Mostly it gets remedied after some days, therefore I did not react immediately when the problem arrived last week.

Please don't hesitate to reach out anytime you see errors!!

The problem is for some reason under certain circumstances

Yeah, SNAFU, and has to be anticipated.

Well, not exactly. It's preferable not to load ext.centralNotice.display if there are no campaign choices, since in that case it's not needed, and not doing so improves site performance. The issue was, in fact, that choices were being sent when they shouldn't have been, due to a RL-related caching issue. There's now a patch in the pipeline to make this failsafe. As for a general principle, I think we need to balance handling unexpected error states and keeping our code lean and straightforward. In this case we certainly erred on the wrong side of the issue, but it's often not cut-and-dried.

Thanks again for the time you spent on this!!

P.S. Please feel free to try your hand at any CentralNotice tasks marked "Easy", if you're so inclined. :)

Krinkle lowered the priority of this task from Unbreak Now! to High.May 9 2016, 4:33 PM

Should we mark this as resolved?

@AndyRussG – see T136387 – Fundraising and centralNotice is a permanent source of severe crashes on top level for nearly a decade now. Poorly tested, and obviously very strange syntax.

Two simple ways for reviewers and coders:

  1. Look at all definitions above and below. Always by anonymous function declaration.
  2. Put the code of rECNO /resources/subscribing/ext.centralNotice.display.state.js into http://jshint.com/ and it will tell you Line 361, 'concise methods' is available in ES6 (use 'esversion: 6') or Mozilla JS extensions (use moz).

Change 285674 abandoned by Krinkle:
choiceData always checks for mw.centralNotice object

Reason:
Looks like this was landed in master and presumably found its way into the branch already since then.

https://gerrit.wikimedia.org/r/285674