Page MenuHomePhabricator

Strategy for avoiding or excluding client side errors from gadgets and user scripts
Open, Needs TriagePublic

Description

Decision Statement Overview Artifact & Decision Record

What is the problem or opportunity?

We currently log JavaScript errors that our users encounter. This helps us make sure the code across our repositories are well maintained and resilient (to connections, skins, across pages).

The same tooling also helps us identify errors in gadgets maintained by editors on wiki.

Problematically we cannot easily tell the difference between errors originating in our code and errors originating in gadgets without manually triaging or fixing every issue. This is costly work. This manual work has been done however and at least 20% of our errors originate in gadgets.

This is a problem as editors currently do not have access to these logs without signing an NDA, may not share the same motivation/values around code quality or may have written their gadgets some time ago and have since left our projects. Where Wikimedia-staff have fixed errors, there has sometimes been friction has some users take issue with staff editing scripts that editors see themselves as "owners".

Historically Wikimedia has set the informal expectation that it does not maintain community gadgets, however, if gadget errors are equal alongside Wikimedia code errors, it's hard to ignore them. When gadgets change jQuery version or introduce other code that breaks experiences for users, it becomes hard to ignore.

It would be useful to get interested parties involved in a conversation to reflect on the status quo, how much we should care and if we don't care modify our code to reflect this.

Goals:

  1. Define a policy around gadget/user scripts that lives on a wiki page somewhere. It should make clear to WMF staff and gadget developers expectations around who can edit, what is expected with regards to quality and errors, and what should happen to gadgets where the maintainer has retired or lost interest.
  2. Our error logging system should be built around this policy, potentially programatically distinguishing between WMF errors and gadget errors. This may come with trade-offs such as no longer running gadgets in global scope.
  3. Gadget developers should be able to improve their code if they wish. It should be decided whether the current tooling with NDA i sufficient or if appropriate tooling needs to be provided.
  4. The existing gadget system should be modified if necessary to reflect the new policy.

What does the future look like if it's achieved?

  • WMF spends less time triaging and fixing broken gadgets
  • When building and deploying new products WMF is able to clearly determine impact of changes that break gadgets
  • Users gadgets do not suddenly stop working
  • User gadget developers and WMF staff enjoy a better relationship, better communication.

What happens if we do nothing

Certain staff members including myself will continue to try to prop up the system by keeping an eye on issues on-wiki and fixing them as they arise. They will do this until they lose interest/burn out.

When nobody is looking at errors, unchecked the amount of errors will rise, leading to alerts in grafana and noise in the error logging tooling will grow large.

Eventually this might lead to a high volume of errors that make the tooling useless, and potentially break the EventLogging platform it's built upon. If this ever happens, worse case scenario is that this valuable infrastructure will lose its value, and be turned off, and we will go back to how it was before we had error logging.

More background

Blog posts

Motivation for this ticket

https://gerrit.wikimedia.org/g/mediawiki/core/+/52a0f90a9bd0888078a2bab8f95dc4018539bfb0/resources/src/startup/mediawiki.js#1940

In a 12 hr period 251 / 2,530 production errors came from gadgets e.g. Error: module already implemented: ext.gadget.Cat-a-lot - none of these are actionable as it's not clear how they were loaded. Note the number is likely a lot larger as many of these are likely to be surfacing as "Script error" due to cross domain loading.

Presumably, this is because modules can be loaded via different means from different projects via global scripts.

If a module is already implemented I would prefer it logs a warning rather than an error. Do we really need to throw an error here (asking ResourceLoader expert)? If so, could we at least disable the error logging for modules that are gadgets

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

none of these are actionable as it's not clear how they were loaded.

This applies to all JavaScript errors from an asynchronous stack trace. E.g. the same would happen from an ajax callback or promise involving an undefined variable. All we'd have to go on is the variable name.

This this case we have the module name. Global Search (toolforge) would likely reveal which user scripts are importing a hardcoded load.php URL for implementing ext.gadget.Cat-a-lot. From there you'd either find that the same script does so twice (obvious mistake, unlikely) or indirectly find two such wrapper user scripts that are both imported by the same user.

Note the number is likely a lot larger as many of these are likely to be surfacing as "Script error" due to cross domain loading.

I don't think that applies to load.php, which afaik opens these CORS stacks up as it requires all responses to be public. If you find evidence of the contrary, then we can fix that separately. It's quite possible, as so far these issues have mainly be reported directly by end-users where Dev Tools isn't affected by the cross-domain restrictions, so you always get all details.

Presumably, this is because modules can be loaded via different means from different projects via global scripts.

Yeah. It's quite possible that two wrapper user scripts are importing a gadget manually as a way to not have to enable it on each wiki separately. It's also quite possible that these are different versions fo the gadget, not exact duplicates. E.g. you might be importing the one from enwiki and by commonswiki, the names are only locally unique, and HotCat and Cat-a-lot are the kinds of scripts that have been copied around and either out of date or intentionally modified.

If a module is already implemented I would prefer it logs a warning rather than an error. Do we really need to throw an error here? If so, could we at least disable the error logging for modules that are gadgets

If RL's state machine were to break and fetch or execute a module twice or if client storage corrupts in a way we don't detect, that's a serious issue and applies to all modules equally, gadgets or otherwise. Gadgets if anything are more likely to excercise such bugs due to our own code being relatively consistent and simple compared to the unaudited/excotic code out there in the wild.

Slightly off-topic: I don't know how much further it makes sense to treat client errors with the current precision and urgency that we do. It's inherently largely out of our control, and unlike our server monitoring, I don't think "Inbox zero" can or should be our objective. All of the data is useful to have to be able to go back and find debugging information about known issues reported or found by other means. But in terms of the logs themselves being used a direct source, I think we may want to just set a threshold mentally for N frequency over X time before setting up a task and Logstash filter for it.

Maybe it's only by volume that that we can triage and filter it effectively. I'm not sure what other options there are beyond options that are a lot of work for little pay-off.

Slightly off-topic: I don't know how much further it makes sense to treat client errors with the current precision and urgency that we do. It's inherently largely out of our control, and unlike our server monitoring, I don't think "Inbox zero" can or should be our objective.

Inbox zero is not the goal. Rather the goals are:

  1. reducing the noise of gadgets is the goal so we can roll out client-side error tracking on all projects and get to a point where client side error spikes block deployments.
  2. Service Wikimedia developers better by helping them know which bugs are high priority and which are not so that this data can be used to help product owners prioritize.

Right now there's a huge burden on developers (right now me) to make sense of the errors coming in and filtering out issues that are not our concern (e.g. gadgets). Right now gadgets are drowning the majority of errors. My hope in the future is that certain trusted volunteers can be given logstash access and help us address problems in the gadgets, but for now I'm taking more of an interest as we roll out to more wikis - total coverage is important to me, and it's my understanding that volume to this endpoint is a challenge and restricts what we can do.

In the last 24 hrs, 6,775 errors were recorded.
A whooping 2,214 of these errors came from this bug - that's 32% of all our errors.
This number is going to grow as we roll out to more wikis as far as I can see there's little we can do about it.

this case we have the module name. Global Search (toolforge) would likely reveal which user scripts are importing a hardcoded load.php URL for implementing ext.gadget.Cat-a-lot. From there you'd either find that the same script does so twice (obvious mistake, unlikely) or indirectly find two such wrapper user scripts that are both imported by the same user.

This is a lot of work and not really practical and doesn't seem like valuable work since it's not actually causing any problems.
The work is also not practical because a lot of gadgets are impacted and particularly because the stack trace doesn't reveal which users are doing this and how.

My thinking was that these are not real problems impacting end-users as far as I can see, so I'm not sure while we'd spend energy trying to get these fixed. All that's happening is gadgets are loading the same gadget from multiple sources and whichever loads first is the one they end up with. I don't think the burden on editors should be about knowing where and when they need to load the gadgets they want.

Right now I just want these errors out of logstash and given these account for 10% of all our errors I don't want this to block rolling the client error tracking to English Wikipedia. There's absolutely no value in them to me and suppressing these errors seems well within our control.

Put another way, what value is throwing this error giving us as developers?

Other suggestions welcome, but the goal should be not to throw an error in this situation. If we need to record it, could statsd be used instead rather than throwing the JS error? Could the gadgets extension be modified to ignore throwing this error when gadgets are loaded?

Put another way, what value is throwing this error giving us as developers?

I've already answered this:

If RL's state machine were to break and fetch or execute a module twice or if client storage corrupts in a way we don't detect, that's a serious issue and applies to all modules equally, gadgets or otherwise. Gadgets if anything are more likely to excercise such bugs due to our own code being relatively consistent and simple compared to the unaudited/excotic code out there in the wild.

Like all errors, the eror should be impossible. And in general when using dependencies, or using(), or load(), fetches are naturally coellesced, re-used and de-duplicated. Thus if these standard interfaces are used and have no bugs, the error does not happen. Catching when that goes wrong for whatever reason is hugely important.

In this case, a gadget is fetching source code directly from a load.php endpoint, and then fetching a similar payload from another load.php endpoint, both of which bypass caching and de-duplication, at which point the second one is an error as it can't execute a different payload under the same name. There isn't a way to distinguish that from real errors. And this is in fact a real error. The second payload ended up being rejected and not actually execute, despite the user asking for it. If there was a tolerant way to perform this operation, I don't think it would take off in practice since whover writes the original code would want to have error detection etc if they're a more experienced gadget developer. Or, more likely, they will have just copied what MW does internally and thus not use any "alternate" way we might invent or document. All that is still fine. It's only when two incompatibe scripts are combined by another end-user that it causes this noise.

it's my understanding that volume to this endpoint is a challenge and restricts what we can do.

There are indeed limits to how much we're comfortable piping into EventGate, Kafka, and Logstash. However, that is largely dicated by the amount of page views the wikis get in total. Which errors do or do not get logged is imho not of significance here, as it provides no meaningful protection against floods. The pipes are opened to catch errors when they do happen, judged by how much they could handle at most – not by how much they handle currently.

Right now gadgets are drowning the majority of errors.
[…]
In the last 24 hrs, 6,775 errors were recorded. A whooping 2,214 of these errors came from this bug - that's 32% of all our errors.

Those numbers sound big, but are they? Based on Logstash, the clientError collection seems enabled on 5 major wikis. I believe it has no sampling or other specificity currently. Numbers from Logstash and stats.wikimedia.org:

wikiLogstash: 24h total no filters + mobileStats: Page views (Aug 2020)Stats: Uniques (Aug 2020)Errors per viewErrors per device
commons.wikimedia.org3,745 + 13529M/day (878.7M/mo)21M0.013% views0.018% devices
he.wikipedia.org266 + 183.200M/day (95M/mo)6.1M0.008% views0.004% devices
ca.wikipedia.org39 + 131.030M/day (31M/mo)2.1M0.005% views0.002% devices
meta.wikimedia.org1,500 + 1320.430M/day (13M/mo)1.3M0.37% views0.12% devices
www.mediawiki.org160 + 20.400M/day (12M/mo)0.6M0.04% views0.02% devices

I can't judge whether this is high or not. Either way, it'd be useful to establish a threshold.

I'm happy to repurpose this task about this, but request as stated is declined.

As most errors, they are serious when they happen in code we care about, and not so much in others. One could argue that for any kind of error, it isn't providing value when the samples originate from code we inherently don't own.

I can't judge whether this is high or not. Either way, it'd be useful to establish a threshold.

I'll let @jlinehan chime in to answer this one.

m happy to repurpose this task about this, but request as stated is declined.
Like all errors, the eror should be impossible.

I'm not sure why it's supposed to be impossible.

I'm happy to repurpose but the bug is still valid and a real problem even if my strawman proposal for fixing it is not valid. I'm not interested in arguing whether this is a bug but in finding a good solution.

The replication steps are straightforward.

  • I enable the cat a lot gadget on https://commons.wikimedia.org/wiki/Special:Preferences#mw-prefsection-gadgets
  • I add the line mw.loader.load('//commons.wikimedia.org/w/load.php?modules=ext.gadget.Cat-a-lot'); to my global JS at meta.wikimedia.org/wiki/User:Jdlrobson/global.js as I want the gadget on every wiki I use and some wikis do not have it.
  • Now every page view on Commons results in the error Uncaught Error: module already implemented: ext.gadget.Cat-a-lot

Now is this an issue with global gadgets or gadgets, or ResourceLoader I don't know, but I as a user shouldn't be punished for that.

We need to close the source of this error as I see this as the main thing hindering us from rolling out to English Wikipedia.

I can't judge whether this is high or not. Either way, it'd be useful to establish a threshold.

I'll let @jlinehan chime in to answer this one.

To clarify, I was referring to a threshold from a product perspective in terms of what is "serious" for end-users.

I was not refering to a threshold for the in-take infrastructure. As far as I know, those values were already reviewed and approved prior to roll-out. But, it doesn't hurt to double-check that we are indeed comfortable accepting upto 5 potential errors from this number of combined page views.

Put another way, what value is throwing this error giving us as developers?

Like all errors, the eror should be impossible. […] they are serious when they happen in code we care about, and not so much in others.

I'm not sure why it's supposed to be impossible.

I was merely stating the obvious that applications or interpreters generally only throw uncaught errors for things that aren't supposed to happen.

It was my understanding that this task is about bad code written by end-users being a nuisance. And thus you questioned whether the error condition can actually be a serious error or not, which if not, would have presented an easy way out of this by removing the exception clause. I hope the above shows that isn't the case.

Looking specifically at your latest example, I don't think that's obviously "bad code". It's hard to confirm of course, but it seems plausible that the ones we are seeing in Logstash could come from this kind of code.

The replication steps are straightforward.

  • I enable the cat a lot gadget on https://commons.wikimedia.org/wiki/Special:Preferences#mw-prefsection-gadgets
  • I add the line mw.loader.load('//commons.wikimedia.org/w/load.php?modules=ext.gadget.Cat-a-lot'); to my global JS at meta.wikimedia.org/wiki/User:Jdlrobson/global.js as I want the gadget on every wiki I use and some wikis do not have it.
  • Now every page view on Commons results in the error Uncaught Error: module already implemented: ext.gadget.Cat-a-lot

Now is this an issue with global gadgets or gadgets, or ResourceLoader I don't know, […]

"Global gadgets" as a feature doesn't exist in production currently. Any user-invented workaround is unofficial, only works sometimes, and thus not supported. If it was this simple, I would've rolled it out years ago. (For the record, T22153 tracks that effort, or used to track; that depends on whether the several months of combined labour from Roan, Kunal and myself will be retroactively wasted, or put to use by an owner giving it priority and resourcing.)

Broadly speaking I've seen three or four different "clever" ways emerge over the years to get the very basics of a "global gadget" to work. As an avid end-user myself, I've also done my share of hackery to come up with a way that works for me.

In a nut shell, the line of code you show above is invalid. It's not meant to work and doesn't demonstrate any kind of issue with Gadgets or ResourceLoader.

It's no different other module systems whether AMD, RequireJS, Node.js or ES6 export/imports. If you were to run define("mymodule, function () {…}); and later run define("mymodule, function () {…});. Executing that twice isn't valid. Even if both use a copy of the same source code, each will be parsed as its own unique JavaScript function with its own identity and state. I don't think any module system could reasonably do something other than throw an error upon encountering the second one. load.php is an internal end-point for use by mw.loader.load(modules) or mw.loader.using(modules). The mw.loader.load(URL) signature is a public method for downloading and executing any arbitrary script URL.

If you want to know unofficially how one could try to load gadgets across wikis, here's a few ways I've seen:

  1. Request the source files directly as raw wiki pages. This was common prior to 2011.
    • No minification. No CSS before JS load order. Can only do one file. For multi-file or dependencies, need to do serially with promises. This may still cause things to load twice, naturally. E.g. ifs both your local common.js and global.js do it, or both in your personal script and a random wiki's MediaWiki:Common.js. Some gadgets export a global variable and have a conditional wrappers to ignore any wastefully loaded other versions. This causes its own bugs of course. Some people use legacy importScriptURI which can sometimes prevent a duplicate script from loading a second time.
  2. Not supported! Inject the gadget into the registry with a hopefully unique name using raw URLs. E.g. mw.loader.register('user.omgglobalgadgets.mygadget', 'someversion', [mydependencies]); then mw.loader.implement('user.omgglobalgadgets.mygadget', [ otherwiki load.php mygadget with only=scripts&raw=1 ], [ otherwiki load.php mygadget with only=styles&raw=1 ]) then mw.loader.load('user.omgglobalgadgets.mygadget');
    • ALL of these are private internals and NOT supported. (load.php, raw=1, register, implement)
    • This uses register() to naturally get dependencies without a using() call. It uses raw=1 so that the cross-wiki responses come through without any mw.loader.implement() wrapper (which is like RequireJS define). This means any double loading of the underlying scripts will NOT be detected and "just" happen, along with any consequences that might have. If something were to do this exact thing twice, including the registration, it would fail at the register() call and throw a global error that would show up in Logstash as module already registered. This could be mitigated by wrapping the whole thing in a if ( !mw.loader.getState('…omgglobalgadgets…')) check.
  3. Not supported! Request the styles and scripts raw from another wiki. E.g. mw.loader.load( otherwiki load.php mygadget only=styles&raw=1, 'text/css'); mw.loader.using(mydependencies).then(function() { return mw.loader.getScript( otherwiki load.php mygadget only=scripts&raw=1 ); });
    • This is much like method 2, but procedural rather than declarative. Using load.php or raw=1 is internal and NOT supported.
  4. Not supported! Conditionally load the gadget locally, or request directly from another wiki. E.g. mw.loader.getState('ext.gadget.mygadget') ? mw.loader.load('ext.gadget.mygadget') : mw.loader.load('https://www.mediawiki.org/w/load.php?debug=false&modules=ext.gadget.mygadget');
    • Using load.php directly is internal and NOT supported.
    • This is in my opinion the simplest and actually what I use in some of my own gadgets (RTRC.js , RTRC-docs).
    • Note that if someone created something by that name on the wiki you're on, it will load that. Even if it's not the same version, or in a different language, or even something entirely different. But it will generally avoid the duplicate implementation error if you're on mediawiki.org and enabled it locally on that wiki. Of course, it can still cause the same error if you're on a different wiki if someone created a gadget there under a different name (e.g. "RealTimeRC"), with the internal code loaded directly from mediawiki.org in the same manner, thus defining the module a second time.

@Krinkle I'm sorry for this misunderstanding here but I'm really not interested in how users are meant to load gadgets or loading gadgets myself. I'm interested in making sure these "errors" (which don't seem like errors) are not polluting what is becoming a useful error logging system. I am aware this is not "serious" for end-users but from a product perspective, I'm trying to tell you that it's creating a challenge for us in rolling out error logging for all our projects which is our current goal. Hence the "noise" I am referring to in the subject line. Hopefully, the EventStreams tag makes this clearer. I tagged gadgets and ResourceLoader as I wanted ideas on how to stop these JS exceptions from being thrown not because I want to help users load gadgets cross wiki - whether we should support that or not is off topic.

"Global gadgets" as feature doesn't exist in production currently. Any user-invented workaround is unofficial, only works sometimes, and thus not supported. There are  176 [2] global.js pages using this method. If this method is not supported then the error is valid and we need to prevent editors from using it. Given we have  allowed users to run arbitary JS on their pages, I think we need to be a little more responsible and opinionated about what's allowed and what is not.

At least 176 users are using this method [1]. If it's not supported the fact people can sounds like a bug to me and we should prevent them from doing that rather than throwing an Error that we can't do much about.

Off the top of my head I have multiple ideas:

  1. this could be communicated socially and we get those 176 instances fixed with some kind of if guard to make sure gadgets that are global are not loading. This doesn't however stop this from happening again and is a lot of work.
  2. A save to global.js could detect this and disallow it
  3. mw.loader.load could be modified to disallow loads from foreign domains.
  4. In addition to the error we use mw.notify to tell the users to fix this issue.
  5. global gadgets should refuse to load code from the same domain.
  6. we stop logging errors for logged in users (I'd rather avoid this if possible)
  7. Using statsd rather than throwing new Error
  8. ... .. im sure there are many more! please suggest

[1] https://global-search.toolforge.org/?q=gadget%5C.Cat%5C-a%5C-lot&regex=1&namespaces=&title=.*%5C%2Fglobal%5C.%28js%29
[2] https://global-search.toolforge.org/?q=gadget%5C..*&regex=1&namespaces=&title=.*%5C%2Fglobal%5C.%28js%29

Krinkle renamed this task from Large amount of noise in production client side error logging from "module already implemented" error to Strategy for avoiding or excluding client side errors from gadgets and user scripts.Sep 15 2020, 7:56 PM
  1. this could be communicated socially and we get those 176 instances fixed with some kind of if guard to make sure gadgets that are global are not loading. This doesn't however stop this from happening again and is a lot of work.

[…]

  1. In addition to the error we use mw.notify to tell the users to fix this issue.

Indeed. This doesn't scale and doesn't solve the underlying problem. I've updated the task title and tags accordingly.

  1. A save to global.js could detect this and disallow it
  2. mw.loader.load could be modified to disallow loads from foreign domains.

The primary purpose of Gadgets is to share and execute JavaScript code that by design we don't support and by design must not be limited to publicly supported APIs.

Restricting some method calls seems expensive to build and maintain, but would also be ineffective. We would still get the same errors, but through plain JavaScript code. Instead of calling mw.loader.load, an editor would call $.getScript or document.createElement('script').

  1. global gadgets should refuse to load code from the same domain.

We don't have "Global gadgets" today and can't be refused as such. What we have are local gadgets and local user scripts, loading from local domains.

I don't see what we can do here, short of disallowing all cross-domain fetches. Which 1) we need for our own products, 2) would be a big setback for the community and result in more copy-paste distribution wasting volunteer time but also ultimately cost more time for staff through indirect causes, and 3) only prevents this one single random JS error - tomorrow we'll see a ten other ones.

  1. Using statsd rather than throwing new Error

This might work for errors that are only relevant to gadget code. That isn't true for the error this task started for and seems unlikely overall. When do see such issues, those would be easy to change to mw.log.warn() instead. Worth documenting!

As a general strategy though, this would be mutually-exclusive with having a client error pipeline. We can't remove all throw statements just because a user script can in theory (and thus will inevitably) trigger it. Not to mention custom exceptions by user scripts themselves and native exceptions from the browser.

Going back to only having statsd counters would be a step back. But, I do think it's important that we adopt a strategy that has us act based on volume. There has to be a treshold under which we simply will not need to look at the dashboard. (Individual teams and product can still have dedicated dashboards that we look for errors relating to their own code, regardless of thresholds being reached.)

Client errors, unlike server errors, are largely out of our control. We don't control browser extensions (which can execute scripts in the main thread as well, not just from their own custom file paths), don't control user scripts, and don't control random packets sent to our beacon intake.

It may very well be that the "majority" we see today simply in totality is low volume noise. This brings me back to: We need a threshold over which errors warrant developer attention.

Earlier today we had 50k errors in 12hrs about 50% of coming that from gadgets which I've had to since fix to get these numbers down. T263688 is another example of something that's becoming a challenge as we roll out further. The fact we can't filter them presents a bit of a challenge.

Would it be feasible to wrap all gadgets in a try/catch and when they throw an error throw a modified error which marks them as gadgets? Practically speaking any module that comes from ResourceLoaderWikiModule. It seems we already wrap them in closures...

We already catch runtime errors during the module initialisation to protect the state machine and ensure execution of other modules. They don't propagate to window.onerror generally afaik. For async call stacks, a higher-level try-catch wouldn't do anything as those calls will have returned already.

I'm unassinging for now pending a decision on the threshold.

We can revisit after that and see whether there would still be outliers from user scripts, and if there are whether they are rare enough to deal the same way as issues in our own softwarre (reach out, filter out etc.) or whether it makes sense to invest in a general strategy to filter them out. Either way, there will always be a low amount of noise from all sources – not limited to user scripts and gadgets.

FWIW in the last 12 hrs after filtering to domains we care about, there were 28,702 errors.
It's really difficult to filter out gadgets accurately. I have tried to create a filter for gadgets, but it still doesn't filter them all - and others are hitting this issue too in T265131.

After filtering known gadget patterns I see only 9,912 of these errors are coming from gerrit hosted code.
This means at least 46% of our errors are coming from gadgets. At minimum we need a way to more reliably detect these gadgets so we can get them fixed.

I do agree we need thresholds, but I think we need error logging on English Wikipedia to define this and this bug does make me uncomfortable about rolling out error logging to English Wikipedia where we know gadget use is high.

Some tags would be useful as a starting point, by providing a Gadget extension helper function that tells returns the names of installed gadgets
e.g.
Object.keys(mw.loader.moduleRegistry).filter((key) => mw.loader.getState(key) === 'ready' && key.indexOf('ext.gadget') > -1)

It would also be using for an mw.loader helper method that returns true if any user scripts have been loaded and are non-empty, and have a tag associated with that. This should at least help us narrow down errors.

Based on a days worth of data I created this pie chart to visualize where our errors are coming from. For the next deployment I believe I've dealt with the bulk of errors from TimedMediaHandler but as you can see the gadget problem is very very real.

What critiria is used for classifying an error as being "Gadget"?

What critiria is used for classifying an error as being "Gadget"?

A very tedious query that I've been manually constructing over time based on stack traces I've seen in the wild. It's not perfect - for example my guess is that many in the "other" category are also caused by gadgets.

Based on a days worth of data I created this pie chart to visualize where our errors are coming from. […]

What critiria is used for classifying an error as being "Gadget"?

A very tedious query that I've been manually constructing over time based on stack traces I've seen in the wild. It's not perfect - for example my guess is that many in the "other" category are also caused by gadgets.

This pie chart seems very suspicious to me since it doesn't at all match emperical data from my own experience over the past ten years investigating client-side errors. I would expect that as soon as anything "real" happens with any noteworthy frequency, all of this becomes <1% noise and disappears. I'd also expect that even when we're idle with little to no actual actionable errors, that the majority of the events classified as Gadgets here are actually related to some other category.

  • Regarding the current classification, I've commented in more detail at change 634295, but in a nutshell, the checks used in that patch and used in Logstash under "Gadgets (DSL)" capture far too many unrelated errors, and miss some others. It captures all errors relating to any wiki where one or more gadgets are enabled by default, regardless of what actually caused the error (due to batch loading). It also captures all errors relating to an anonymoous inline script or inline event handler, which afaik would mainly capture CentralNotice, MobileFrontend, and browser extensions. Not gadgets.
  • There seems to be something fundamentally broken about the pipeline since the overall client-error traffic, even after filtering out logged-in traffic, seems to be heavily biased toward Firefox. More than 95% in fact. Barring a miracle growth in market share in the past 24 hours, or something special about Commons' audience in pariticular, I would guess there is a bug in the instrumentation that is causing the vast majority of (real) errors to dropped by our instrumentation and never reported.
    • I suspect the file_url check in the instrumentation is at fault. This was added to filter out a bit of noise from cross-site script imports of which the URL is hidden by the browser for security reasons, but I suspect it is instead causing > 90% of genuine errors to be dropped as well because cached script run from localStorage and thus don't have an HTTP-based URL. Due to performance issues in Firefox, we have disabled use of localStorage in Firefox, which would explain why the errors we do get are mostly from the <5% of users using Firefox, or from edge cases where scripts couldn't be cached.
  • I've injected a temporary fake error stack in Commons for 1% of traffic on desktop (edit) and mobile (edit). This rolled out at 21:00 UTC. After waiting ~10min, I monitored Logstash for 30 minutes from 21:12-21:42, during which only a mere 110 errors were reported as found by the query meta.domain:("commons.wikimedia.org" OR "commons.m.wikimedia.org") AND message:*undefinedExample* of which 107/110 were Firefox and 3/110 were WebKit or Chromium. In total, Commons got 234 errors during that same time frame. Clearly we're missing a ton of data here as Commons receives a lot more traffic than that.

It captures all errors relating to any wiki where one or more gadgets are enabled by default, regardless of what actually caused the error (due to batch loading). It also captures all errors relating to an anonymoous inline script or inline event handler, which afaik would mainly capture CentralNotice, MobileFrontend, and browser extensions. Not gadgets.

Can you point me to some precise examples that you think are being wrongly filtered by the Gadgets DSL query? I should note whenever I use this (including in the creation of this graph) I use it after running all the other filters active there. I am aware it is not 100% accurate, but I've been revising the filters with every new discovery I make. When I update that gadget query I've been very careful and have crossed checked stack traces with codesearch and global-search.

e.g.

It's good to be suspicious, but some concrete examples would help here. The fair bulk of these gadget errors as I can see are due to the "Error: module already implemented" bug which I originally reported here (about 54%). These are clearly gadget bugs. If any gadgets are being falsely identified they are a really small percentage. In fact reviewing the top 20 now I can confirm the gadget DSL query is pretty spot on with its classification.

There seems to be something fundamentally broken about the pipeline since the overall client-error traffic, even after filtering out logged-in traffic, seems to be heavily biased toward Firefox.

Can you cut a bug for this? This seems to warrant further investigation, but yes i have noticed an unusual amount of Firefox issues comparatively to Chrome. I've not had a chance to dive into this yet, as my main focus is been on being able to categorize them accurately.

Regarding the current classification, I've commented in more detail at change 634295, but in a nutshell, the checks used in that patch and used in Logstash under "Gadgets (DSL)" capture far too many unrelated errors, and miss some others

I've clarified on the ticket but the goal is to provide a hint that a gadget may be involved not to say with 100% certainty this came from a gadget.

[update]

I suspect the file_url check in the instrumentation is at fault.

FWIW I replicated your experiment locally and I can confirm that errors in Chrome 86 have an empty string as the url field and no file_url entry. However I think this is a good thing as the errors Chrome is throwing are inactionable.

I've created T266517 to look into this further inside ResourceLoader. I'm not sure why this is acting differently from other modules.

This error doesn't seem to impact code using ResourceLoaderFileModule, so my read of this would mean errors from gadgets are even higher.

Doesn't at all match emperical data from my own experience over the past ten years investigating client-side errors.

I'm a bit curious about how you did this without client-side error logging so would appreciate some expansion on what "own experience" means here.

@Jdlrobson wrote at 266720:

The volume of this error is extremely problematic and happening too often to be ignored. It happens at levels that in the past we have considered UBN if it occurred for newly deployed […] At the time of writing in a 24hr period, there were 9,782 instances out of 32,079 unresolved errors meaning this accounts for 30% of the errors we track (this excludes errors from browser extensions and non-wikimedia domains).

That doesn't seem to me like a signicant amount to me. Especially given the trafic the instrumented wikis receive, and that the instrumentation isn't sampled. Where was this threshold decided?

Like I said before, I'm fairly sure most anything serious would immediately drown this out completely – leaving aside the known issue that almost all client errors are currently being dropped due to the faulty file_uri check.

I'll write a longer reply tomorrow but must reply to this comment now:

leaving aside the known issue that almost all client errors are currently being dropped due to the faulty file_uri check.

That's not correct. As I've pointed out in my investigations in T266517 it looks like we're only dropping client errors in Chrome from ResourceLoaderWiki modules - and to be honest, anything that drowns out gadget errors right now is a good thing for now as it reduces the risk of rolling out to English Wikipedia. I encourage you to do your own investigations here. I am sure you will reach the same conclusion.

Some evidence that goes against your theory:

  • Right now our highest volume error is in TimedMediaHandler (T262947) which will be fixed in the next deploy and that's happening mostly on Chrome so clearly we're not dropping "almost all". That was the highest error before we added the file_uri check and continued to be the highest after and wasn't impacted by the change to WikimediaEvents.
  • If we consider all errors, in a 24hr period 11,771 of our errors come from Chrome 86 and 5,241 from Firefox 82. If I filter by the (albeit flawed) gadgets DSL filter, the Firefox to Chrome ratio changes drastically.

(you must remember this has been my obsession for the last month and I've been logging at logstash daily and I would have noticed if we were dropping high profile errors with the file_uri change)

The initiative on https://meta.wikimedia.org/wiki/User:Jdlrobson/User_scripts_with_client_errors and various Mobile.js/Common.js changes to flag common errors seem to be helping. In the last 24hrs there were much fewer gadget errors than the last time I ran it.


TimedMediaHandler is much less buggy now.
Gadgets are slowly getting fixed which is good to see.

I've published a blog post about this particular problem (see point #5) https://diff.wikimedia.org/2021/03/08/sailing-steady%e2%80%8a-%e2%80%8ahow-you-can-help-keep-wikimedia-sites-error-free/

I encourage any thoughts people have on this subject here.

Looking at the data for today, gadgets still account for almost 20% of all our errors and that's going to increase this week as we roll out error logging to errors without a file URI for debugging.

If I am to investigate a technical problem I need:

  1. Wiki (wgServerName)
  2. Browser (navigator.userAgent)
  3. skin
  4. wgAction
  5. wgNamespaceNumber
  6. Any JavaScript error
  7. Any special page (wgCanonicalSpecialPageName)

None of this information is sensitive, and none of this information is at https://grafana.wikimedia.org/d/000000566/overview?viewPanel=16&orgId=1

I don't need any sensitive information like the entire page name (wgPageName) or the user name (wgUserName).

On second thought: Forget about wgServer and wgServerName. Just give us location.host which differentiates between desktop and mobile.

Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)

On second thought: Forget about wgServer and wgServerName. Just give us location.host which differentiates between desktop and mobile.

I think we can provide a lot more information. Whatever is provided however will require some kind of processing to scrub the errors to remove sensitive information before publishing. A tool could be built to do this.

For now the graphs are the only public artifact we have so at least when making site wide changes it should be obvious when an error has introduced, particularly on a larger wiki.