Page MenuHomePhabricator

Set up and test Sentry on Labs for JS error logging
Open, NormalPublic

Description

Per T525, Sentry seems the most mature FOSS error logging software. We should set up a test instance - as unlikely as it is to scale up to the traffic of WMF sites, it would spare us a lot of development time if it still does. And even if it does not, it's commercial-quality error logging code and reviewing it will help us figure out how to build our own error logger.

Also, while we are at it, we could probably try to use it to log PHP errors (and whatever else there is) as well.

Related Objects

StatusAssignedTask
OpenTgr
ResolvedTgr
ResolvedGilles
OpenNone
OpenNone
OpenTgr
OpenTgr
ResolvedTgr
Resolvedcsteipp
ResolvedTgr
ResolvedTgr
ResolvedAklapper
ResolvedTgr
ResolvedTgr
OpenNone
ResolvedTgr
DeclinedTgr
DeclinedTgr
StalledTgr
ResolvedTgr
StalledTgr
ResolvedTgr
OpenTgr
ResolvedKrinkle
DeclinedNone
OpenTgr
ResolvedTgr
OpenTgr
OpenNone
InvalidNone
StalledTgr
ResolvedTgr
OpenNone
Resolvedjcrespo
ResolvedTgr

Event Timeline

Tgr created this task.Nov 20 2014, 1:19 AM
Tgr updated the task description. (Show Details)
Tgr raised the priority of this task from to Normal.
Tgr added a project: Multimedia.
Tgr changed Security from none to None.
Tgr added subscribers: Tgr, He7d3r, Gilles.

@Legoktm mentioned that there's already an unused Sentry server running on labs.

Gilles moved this task from Untriaged to Prototyping on the Multimedia board.Nov 24 2014, 4:09 PM
Gilles moved this task from Prototyping to Next up on the Multimedia board.Dec 3 2014, 5:54 PM
Tgr claimed this task.Dec 4 2014, 2:42 AM
Tgr added a comment.Dec 11 2014, 4:00 AM

For other uses of Sentry, see T70820.

Tgr added a comment.Dec 15 2014, 6:12 AM

I set up raven.js on http://multimedia-alpha.wmflabs.org/ to log to the Sentry instance Bryan set up earlier: http://sentry-beta.wmflabs.org/jserrors/jserrors/ (I'll push the integration code to mediawiki/extensions/Sentry once it is created).

Sentry uses a js logger called Raven, which is basically TraceKit (an error catching/browser compatibility library) plus remote logging plus a plugin system.

TraceKit's browser compatibility seems decent:

Supports:

  • Firefox: full stack trace with line numbers, plus column number on top frame; column number is not guaranteed
  • Opera: full stack trace with line and column numbers
  • Chrome: full stack trace with line and column numbers
  • Safari: line and column number for the top frame only; some frames may be missing, and column number is not guaranteed
  • IE: line and column number for the top frame only; some frames may be missing, and column number is not guaranteed

    In theory, TraceKit should work on all of the following versions:
  • IE5.5+ (only 8.0 tested)
  • Firefox 0.9+ (only 3.5+ tested)
  • Opera 7+ (only 10.50 tested; versions 9 and earlier may require Exceptions Have Stacktrace to be enabled in opera:config)
  • Safari 3+ (only 4+ tested)
  • Chrome 1+ (only 5+ tested)
  • Konqueror 3.5+ (untested)

Raven adds support for automatic wrapping of native funcions (like setTimeout) and jQuery event and AJAX handlers. Generates event ids (on the client side, without any deduping). Client-side throttling/samplig is supported via callbacks. The error reporting mechanism is not configurable but is done via an invisible pixel integration with Varnish should be possible. There are all sorts of nice features for adding additional information.

Sentry claims to support source maps. There is a Phabricator plugin for easy ticket creation.

Did some dummy testing: exception and custom message reporting works; onerror logging works but seems flawed (the message is not logged). Stack traces don't show up for either case. This could just be a side effect of me triggering the errors from the debug console - need to test with more realistic errors. Deduping should also be tested (works for the trivial case of generating the same error in the same browser multiple time, but should be checked for different browsers / languages).

On the whole, seems promising for small MediaWiki installations, assuming stacktraces work. I doubt it scales to WMF traffic but maybe that can be handled by channelling through Varnish and logstash and deduping there.

In T1345#847274, @Tgr wrote:

I doubt it scales to WMF traffic

It does, but with a very different server-side setup than the default one. They provide guidelines on how to make it run on a very large traffic environments. The default setup should be fine for beta, though.

As for deduping, they don't do much there but the little they do is more than the alternatives. According to them, nobody has bothered solving the deduping issue at large (eg. across languages), be it open or closed source.

Tgr added a comment.Dec 15 2014, 8:59 PM

Tested by adding a nonexistent function call to a MMV file; the results are not so great so far.

Change 179987 had a related patch set uploaded (by Gergő Tisza):
Initial commit

https://gerrit.wikimedia.org/r/179987

Patch-For-Review

Tgr added a comment.Dec 15 2014, 9:41 PM

Judging from the comments in https://github.com/getsentry/sentry/issues/889 Sentry dedupes strictly based on the stack trace, so it should be able to merge errors across browsers/languages.

Tgr added a comment.Dec 15 2014, 10:05 PM

TraceKit by the way seems abandoned, with no contributions in 2014 and key pull requests like support for exceptions in Chrome window.onerror pending since several months.

Raven.js has a forked version of TraceKit, and that seems well-maintained.

Tgr added a comment.Dec 15 2014, 11:14 PM
In T1345#848946, @Tgr wrote:

This is apparently the interplay of three different bugs. Reported upstream:
https://github.com/getsentry/raven-js/issues/300
https://github.com/getsentry/sentry/issues/1334

Tgr added a comment.Dec 16 2014, 12:28 AM

Logging of errors which happen on clicking on a link and which do not block navigation seems to suffer from the same problem as EventLogging for such clicks (T44815). Will be solved eventually via https://github.com/getsentry/raven-js/issues/293

Tgr added a comment.Dec 16 2014, 1:11 AM

Wrapping all JS modules via Raven.wrap (i.e. not relying on window.onerror) makes error logging completely reliable (it still misidentifies the file though). Doing that might require a patch to ResourceLoader, and it would mean that raven.js is loaded as a startup module, blocking page rendering (that's about 25K which is a 14% increase to the startup module size).

Tgr added a comment.Dec 16 2014, 1:32 AM

To sum it up, blockers for using Sentry:

  • wrap scripts in error reporting code (T513) if it is deemed acceptable (the code is already provided by raven.js)
  • source maps for non-debug mode (T47514)
  • raven.js#300 (could be fixed in at least two different ways, one of which sounds trivial)
  • figure out scaling

None of those seem extremely hard to solve, while redeveloping the same functionality would be a huge effort (just the browser compatibility logic in TraceKit seems like it would take months to figure out) so this should definitely be the way to go, even if we don't consider the value of possibly tracking all kind of client- and server-side errors in the same application.

I guess the next step would be to ask Roan or Timo about the ResourceLoader changes, and then get the extension deployed on the beta cluster? We could more easily test issues like CORS and handling of multiple wikis there.

Tgr moved this task from Next up to Needs code review on the Multimedia board.Dec 16 2014, 1:33 AM
In T1345#849752, @Tgr wrote:

Wrapping all JS modules via Raven.wrap (i.e. not relying on window.onerror) makes error logging completely reliable

Indeed, the sentry guys told me that it's the only way to have reliable reporting, hitting onerror should be avoided since it provides insufficient/unreliable information. This isn't specific to their library, but a larger issue with onerror.

You should join Sentry on freenode, they're very responsive there if you have any questions.

Tgr added a comment.Dec 16 2014, 8:51 AM

Indeed, the sentry guys told me that it's the only way to have reliable reporting, hitting onerror should be avoided since it provides insufficient/unreliable information. This isn't specific to their library, but a larger issue with onerror.

onerror should be reliable on modern browsers, the problem with it is that older ones don't provide the stack trace. TraceKit does something complicated for IE8 compatibility, with rethrowing exceptions and using timeout to see if they hit onerror, and I think that messes up logging on Chrome.

You should join Sentry on freenode, they're very responsive there if you have any questions.

Yeah, I did (and indeed they are quick to answer) but it's still nice to try things out.

Change 180309 had a related patch set uploaded (by Gergő Tisza):
Add jobs for Sentry

https://gerrit.wikimedia.org/r/180309

Patch-For-Review

Change 180309 merged by jenkins-bot:
Add jobs for Sentry

https://gerrit.wikimedia.org/r/180309

Change 179987 merged by jenkins-bot:
Initial commit

https://gerrit.wikimedia.org/r/179987

Tgr moved this task from Ready for testing to Next up on the Multimedia board.Dec 22 2014, 7:22 PM

Sentry is now live on the beta cluster and collecting JS errors (well, some of them - it will be more useful once T78809 is merged). Pinging WMF-Legal so they are aware as this includes private visitor data (IP addresses, user agents). The (temporary) sentry instance is at http://sentry-beta.wmflabs.org/jserrors/jserrors/ ; for now access is given manually, and only to WMF employees and volunteers with NDAs. Ping me if interested.

Tgr moved this task from Next up to Prototyping on the Multimedia board.Feb 10 2015, 6:54 AM
Tgr added a project: Sentry.Mar 6 2015, 1:52 AM
Gilles moved this task from Prototyping to Untriaged on the Multimedia board.Apr 6 2015, 9:22 AM
Tgr moved this task from Backlog to Goals on the Sentry board.Jul 20 2015, 3:01 PM
Tgr moved this task from Goals to Backlog on the Sentry board.Jul 24 2015, 11:23 PM
Sitic added a subscriber: Sitic.Aug 8 2015, 12:49 AM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptApr 14 2016, 1:23 AM
ZhouZ moved this task from Backlog to Assigned on the WMF-Legal board.Apr 14 2016, 1:23 AM
ZhouZ moved this task from Assigned to Legal Done on the WMF-Legal board.Apr 19 2016, 6:50 PM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 6:50 PM