Page MenuHomePhabricator

Error: vs Uncaught Error: in client error messages
Closed, ResolvedPublic

Description

Browser A:
Error: module already implemented: ext.gadget.Navigation_popups
Browser B:
Uncaught Error: module already implemented: ext.gadget.Navigation_popups

Should these error messages be merged in post-processing?

Event Timeline

jlinehan moved this task from Inbox to Doing on the Product-Data-Infrastructure board.
jlinehan lowered the priority of this task from High to Medium.Sep 28 2020, 3:18 PM
jlinehan moved this task from Doing to Next on the Product-Data-Infrastructure board.

This seems reasonable, but I'd like the reason to be made more explicit so that we understand what the investment is for and so that the logic in question has a reason for existing and thus a possible exit path if that reason becomes obsolete.

My general recommendation would be to 1) upon proactive triage look at the top 3 most common ones by aggregate normalized_message over the past N hours and then move on with other chores, and 2) only go further if and when the total volume for a given wiki domain has reached a predefined alerting threshold, at which point we'd see where the majority is coming from and fix or exclude accordingly. […]

The variations are quite limited, basically by browser engine (Firefox/SpiderMonkey vs WebKit/JavaScriptCore vs Chromium/V8). I'll also note that there are many more variations possible when variables, callbacks or computed propertries are involved:

var obj = {};
key = Math.random().toString().slice(2);
obj.nope[key]
// Chrome:
//> Uncaught TypeError: Cannot read property '7348806558734886' of undefined

In other words: Aggregation will never be perfect, and we need to primarily respond by volume. Beyond that, when aggregation is easy and possible, we should obviously do it to make triage easier.

I think that for simple cases where no variables are involved, this aggregation is already working well enough to surface things to the top of the trending normalized_message panel - which seems to be all we need to then file a task for.

You can go a long way by replacing numbers and quoted string literals in the error message (we do that for SQL errors for example).

Although in the long term I think it would be a better aggregation strategy to rely on stack trace hashes (once we have proper source maps). There are other things affecting error messages which cannot be accounted for, like i18n.

This would be really helpful as I'm frequently updating filters as the logstash UI doesn't do a great job of making the required filter of 2 things easy.

Change 668240 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/WikimediaEvents@master] Normalize "Uncaught error"

https://gerrit.wikimedia.org/r/668240

All, I've +1'd @Jdlrobson's patch above because it's a relatively small change that introduces a nice seam for normalising errors on the client. That said, I can also make the argument for keeping the client as lightweight as possible and doing any/all normalisation at ingestion time.

Change 668240 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] clientError: Normalize "Uncaught error"

https://gerrit.wikimedia.org/r/668240