Page MenuHomePhabricator

Don't count startup script resourceloader.exception events in WebClientError error counting
Closed, ResolvedPublic2 Estimated Story Points

Description

Following investigations into T252223 we are seeing resource loader exceptions on every page view on iOS 13 and the number of errors from iOS in our error counting is climbing by the minute.

The bug can be replicated by running the code

try{
    var i = 0;
    while ( true ) { i++;
        localStorage.setItem('test', new Array(i * 100000).join('a'));
    }
}catch(error){
    alert("test stopped at i: " + i);
}

to fill localStorage.

The error in question appears to be in the startup module.

Proposed fix

Given this is not likely to represent a user error, we should filter it out in the Minerva error counting by checking the error source

                mw.trackSubscribe( 'resourceloader.exception', function ( topic, data ) {
alert(topic + JSON.stringify(data)); 
                        $( 'body' ).addClass( 'user-jdlrobson-error');
                } );
source === 'store-localstorage-update'

Event Timeline

Restricted Application added a project: Performance-Team. · View Herald TranscriptMay 19 2020, 12:27 AM
Restricted Application added subscribers: Masumrezarock100, Aklapper. · View Herald Transcript

I'm merging this back into T252223 to avoid forking the conversation so I won't have to explain it in two places.

Jdlrobson reopened this task as Open.May 19 2020, 3:00 PM

@Krinkle the spike was about investigating why the graphs were spiking. This task is about patching Minerva to not include those results so reopening.

As this one is reported via resourceloader.exception (and not global.errror) it means it was correcly caught and dealt with. In the new client error infra, we only track global.error. I have recommended in the past to ignore these other ones for Minerva. Worth reconsidering I suppose :)

^ this is exactly what this task is proposing albeit I was taking the more conservative approach of just ignoring this particular message.

Jdlrobson updated the task description. (Show Details)May 19 2020, 3:00 PM
Jdlrobson renamed this task from Startup script is causing high amount of client errors on iOS 13 browsers in Minerva to Don't count startup script ResourceLoader exceptions in WebClientError error counting.May 19 2020, 5:01 PM

(hopefully this title and the edit makes this clearer)

ovasileva set the point value for this task to 2.May 19 2020, 5:15 PM
Krinkle added a comment.EditedMay 19 2020, 5:24 PM

OK. Just to clarify, I would not call these "startup exceptions". We still very much want to count actual exceptions from anywhere, including startup/resourceloader.

While admitedly confusingly named, the resourceloader.exception topic is not a subset or dedicated version of global.error. It is entirely unrelated.

It communicates issues with the environment (the user's browser), and/or with the actual core/extension modules that are being loaded. They can't go to global.error because we need to handle them internally first to protect the state machine.

Ignoring these means you'll loose a lot of useful information I think. Such as:

  • Error for module having an unknown dependency.
  • Error for module cache being corrupt (e.g. the browser or another script overwrite the module cache with an invalid value).
  • Error for if a mw.loader.using() callback throws an error.
  • Error if any module's initial index file / init function causes an exception of any kind.
  • Error if the localStorage was found to be full.

If we are only interested in 100% pure signal of actual unhandled exceptions in the main JS thread, then global.error will get you that. Although it will still have a lot of noise from user scripts and browser extensions.

Similarly, resourceloader.exception has expected noise as well such as localStorage being full. The call is yours, but these seem like useful bits of information. It just requires some judgement and balancing for how to respond and investigate them. The root cause is not ResourceLoader.

Jdlrobson renamed this task from Don't count startup script ResourceLoader exceptions in WebClientError error counting to Don't count startup script resourceloader.exception events in WebClientError error counting.May 19 2020, 5:33 PM

I guess it depends on how frequent they are. If a gadget on enwiki fills up localStorage for everyone, that seems worth knowing. Similarly, if VE or MF stores things there without ever evicting or shrinking its footprint, that's might also be important to find out about and monitor at least quantitatively.

Change 598721 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[mediawiki/skins/MinervaNeue@master] Exclude localStorage errors from error-logging

https://gerrit.wikimedia.org/r/598721

Change 598721 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Exclude localStorage errors from error-logging

https://gerrit.wikimedia.org/r/598721

I can QA this one but it's blocked for now.

hmm 1.35.0-wmf.34; 2020-05-26 should be everywhere now but I'm not seeing much difference to the graph.

I guess our theory here was wrong and we actually have a notable bug hidden somewhere in our code or the caching of iOS is more aggressive then we thought.

Jdlrobson closed this task as Resolved.Jun 3 2020, 6:59 PM

Sorry @Jdrewniak we didn't dent those iOS errors. Back to the drawing board, but it was worth a try! At least we can now conclude that not many users are exhausting localStorage which is interesting data to learn.

I guess […] or the caching of iOS is more aggressive then we thought.

If this is about JS code, then it is not possible for new page views to still use old code after 10min have past, unless the browser violates HTTP semantics that I believe WebKit to follow. Speaking within valid HTTP semantics –  the availability, effectiveness and eagerness of a browser cache can only infuence how much of the max-age the browser will still have and use a cached response for. It is not valid to use a response beyond its max-age.