RfC: Server-side Javascript error logging
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Qgil
	Aug 28 2014, 7:44 AM

Description

WARNING: The plan outlined her is obsolete, or at best, indefinitely stalled. The current focus is on a simpler JS error logging system without Sentry, which is outlined in T226986: Client side error logging production launch.

Main document: Server-side Javascript error logging RfC.

Implementation steps:

T525: Review existing JS error logging solutions
T499: Create error logging JS module
T500: Create basic endpoint for JS error logging (MVP for vanilla MW)
T501: Create WMF endpoint for error logging - part 1 (producer)
T502: Create WMF endpoint for error logging - part 2 (consumer) (experimental MVP for Wikimedia)
T526: Add sampling and throttling support to JS error logging
T521: Make sure JS error logging respects user privacy (stable MVP for Wikimedia)
T519: Improve error id generation in JS error logging
T514: Collect environment information for JS error logging
T512: Deal with some browsers providing less details for JS error logging

Harder / experimental stuff:

T520: Deal with minified scripts in JS error logging
T507: Measure how many users have CORS-hostile proxies
T508: Use CORS-enabled fetch of scripts to avoid same-domain limitations in JS error logging
T513: Wrap scripts with exception handling for automatic JS error logging

(Vague) interface ideas:

T522: Add JS error counts to graphite
T523: Deduplicate JS error logs
T524: Interface to display JS error logs

Details

Reference: fl575

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		None	T106915 Use Sentry in production
Declined		Tgr	T91649 Deploy Sentry (JavaScript error logging) to production, configured to log only a limited subset of users/pages
Resolved		• jlinehan	T88399 Improve Javascript error logging coverage
Declined		None	T382 RfC: Server-side Javascript error logging
Resolved		• jlinehan	T499 Create error logging JS module
Duplicate		None	T500 Create basic endpoint for JS error logging
Resolved		Ottomata	T501 Create WMF endpoint for error logging - part 1 (producer)
Resolved		None	T502 Create WMF endpoint for error logging - part 2 (consumer)
Open		None	T508 Use CORS-enabled fetch of scripts to avoid same-domain limitations in JS error logging
Resolved		Tgr	T507 Measure how many users have CORS-hostile proxies
Declined		None	T512 Deal with some browsers providing less details for JS error logging
Declined		None	T513 Wrap scripts with exception handling for automatic JS error logging
Declined		None	T514 Collect environment information for JS error logging
Declined		None	T519 Improve error id generation in JS error logging
Declined		None	T520 Deal with minified scripts in JS error logging
Resolved		tstarling	T47514 ResourceLoader: Implement support for Source Maps
Duplicate		None	T235667 ResourceLoader: Implement source map support for package files
Resolved		Krinkle	T235672 ResourceLoader: Don't add a version hash param in debug mode
Resolved		Krinkle	T302465 Deprecate "/static/current" at WMF in favour of similar long-cache unversioned /w/ URLs
Resolved		tstarling	T343407 ResourceLoader source map on localStorage cache hit
Resolved	BUG REPORT	tstarling	T348280 Source map invalid for multi-line backtick strings (breaks when debugging MultimediaViewer extension)
Declined		None	T521 Make sure JS error logging respects user privacy
Declined		None	T522 Add JS error counts to graphite
Declined		None	T523 Deduplicate JS error logs
Resolved		Tgr	T524 Interface to display JS error logs
Resolved		• Gilles	T525 Review existing JS error logging solutions
Resolved		Tgr	T1345 Set up and test Sentry on Labs for JS error logging
Resolved		Tgr	T78807 Deploy Sentry extension on beta cluster
Resolved		• csteipp	T86677 Quick/short security review of Extension:Sentry
Resolved		Tgr	T78809 Implement module wrapping for Sentry
Declined		Tgr	T85262 Add startup script to automatically wrap asynchronous functions in try..catch
Declined		Tgr	T86058 Find out whether any browser supported by MediaWiki needs TraceKit's "rethrow to window.onerror" logic for stack traces
Declined		Tgr	T92247 Wrap jQuery AJAX callbacks in try..catch via mw.errorLogging
Resolved		Tgr	T92701 Test the impact of Javascript error logging on performance
Declined		Tgr	T93392 Make sure async wrapping for Javascript error logging only happens when the call is truly async
Resolved		Tgr	T85263 Track module initialization errors in ResourceLoader
Resolved		Tgr	T84957 Vagrantize Sentry
Resolved		Tgr	T90773 Fix log directory permissions in Sentry vagrant role
Resolved		Tgr	T84956 Create basic puppet role for Sentry
Resolved		• jcrespo	T112228 Need to run postgresql::user twice to set the password
Resolved		Aklapper	T84955 Create MediaWiki-extensions-Sentry project
Resolved		Tgr	T89384 Add error id generation to mw.errorLogging
Resolved		Tgr	T105374 Investigate if the XSS vulnerability addressed in Sentry 7.6.1 affects us
Declined		None	T526 Add sampling and throttling support to JS error logging
Declined		None	T91357 Update server-side JS error-logging RfC

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

• Gilles moved this task from Untriaged to Prototyping on the Multimedia board.Nov 24 2014, 3:57 PM

• Prtksxna subscribed.Dec 5 2014, 2:28 PM

Tgr mentioned this in T77309: Show error details.Dec 11 2014, 4:02 AM

Krinkle subscribed.Dec 12 2014, 11:43 PM

Tgr added a project: Epic.Dec 21 2014, 4:30 PM

matmarex merged a task: T53857: Log JavaScript errors.Dec 21 2014, 6:28 PM

matmarex added subscribers: • Mattflaschen-WMF, Legoktm, He7d3r and 5 others.

Some interesting stuff from this presentation:

defunctzombie/browser-stacks - collected examples of stack traces in various browsers, might be useful for testing
Error.captureStackTrace / Error.prepareStackTrace - experimental V8 API which allows direct access to the stack trace object (with function objects and everything)
zone.js - scary but powerful tool to define execution contexts which track async calls

Tgr removed Tgr as the assignee of this task.Jan 30 2015, 2:38 AM

Liuxinyu970226 subscribed.Feb 3 2015, 8:23 AM

daniel assigned this task to • brooke.Feb 25 2015, 8:14 PM

daniel subscribed.

• Spage moved this task from P1: Define to Old on the TechCom-RFC board.Feb 26 2015, 6:51 PM

Tgr changed the status of subtask T513: Wrap scripts with exception handling for automatic JS error logging from Open to Stalled.Mar 25 2015, 11:42 PM

• Spage added a subtask: T91357: Update server-side JS error-logging RfC.Mar 25 2015, 11:49 PM

It sounds like @Tgr is working on improving the RFC in T91357: Update server-side JS error-logging RfC, so assigning this to him. Tgr, when you're done move this task to "under discussion" on #MediaWiki-RfCs and the architecture committee will probably move to "approved" \o/ (The rest of the blockers are for implementation.)

• Spage assigned this task to Tgr.Apr 6 2015, 8:16 AM

• Gilles moved this task from Prototyping to Untriaged on the Multimedia board.Apr 6 2015, 9:23 AM

• Prtksxna unsubscribed.Apr 6 2015, 9:49 AM

• Mattflaschen-WMF unsubscribed.Apr 14 2015, 2:56 AM

• Spage removed a project: TechCom.May 15 2015, 1:03 AM

Tgr closed subtask T525: Review existing JS error logging solutions as Resolved.Jul 16 2015, 5:09 PM

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 16 2015, 5:09 PM

Tgr closed subtask T524: Interface to display JS error logs as Resolved.Jul 16 2015, 5:09 PM

Tgr closed subtask T507: Measure how many users have CORS-hostile proxies as Resolved.Aug 12 2015, 6:00 AM

Tgr closed subtask T508: Use CORS-enabled fetch of scripts to avoid same-domain limitations in JS error logging as Declined.Aug 12 2015, 11:41 PM

Tgr reopened subtask T508: Use CORS-enabled fetch of scripts to avoid same-domain limitations in JS error logging as Open.Aug 12 2015, 11:55 PM

Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:44 PM

Jdforrester-WMF removed a project: Multimedia.Sep 21 2015, 4:22 PM

• DarTar unsubscribed.Sep 21 2015, 10:22 PM

It looks like a large number of the blockers here have been left without any projects.

daniel added a subscriber: • adrianheine.Sep 25 2015, 5:00 PM

daniel added a subscriber: • Jonas.

Added them to Sentry.

Qgil unsubscribed.Sep 30 2015, 8:33 PM

JanZerebecki subscribed.Oct 1 2015, 5:41 PM

Nemo_bis added a project: MediaWiki-ResourceLoader.Nov 26 2015, 3:05 PM

Krinkle moved this task from Inbox to Backlog on the MediaWiki-ResourceLoader board.Dec 14 2015, 8:03 PM

Ricordisamoa subscribed.Feb 20 2016, 5:52 AM

Danny_B added a project: Proposal.May 2 2016, 10:21 PM

• RobLa-WMF lowered the priority of this task from High to Medium.May 25 2016, 5:32 AM

• RobLa-WMF raised the priority of this task from Medium to High.

• RobLa-WMF mentioned this in E187: RFC Meeting: triage meeting (2016-05-25, #wikimedia-office).May 25 2016, 7:03 AM

Agabi10 subscribed.May 25 2016, 7:33 AM

Krinkle removed a project: MediaWiki-ResourceLoader.May 25 2016, 9:52 PM

• RobLa-WMF mentioned this in E198: RFC Meeting: Security is all of our jobs (2016-06-01, #wikimedia-office).May 27 2016, 11:20 PM

Amire80 subscribed.Jan 19 2017, 3:04 PM

Krinkle removed a project: Proposal.Dec 21 2017, 11:39 PM

Hey @Tgr, I'm going to work on this as a side project, to get more familiar with mediawiki. I'm going to read up on the status in the various subtasks, let me know if there's somewhere obvious I should start.

Tgr mentioned this in T77110: Push messages to logstash from JS.Apr 10 2019, 11:16 PM

Tgr mentioned this in T209295: [EPIC] Enable WebClientError on production.Apr 10 2019, 11:19 PM

fgiunchedi subscribed.Apr 11 2019, 8:43 AM

@Milimetric cool! Note though that all this is very outdated and currently most of the work is not in MediaWiki. I wrote a more current summary in T217142#5103038.

Thanks @Tgr, that's a big pivot from what I was expecting, but hey, let's do it! How/when/who is making the decision on each bullet point? I could drive client and pipeline work, and beg ops for help with the Sentry server / Logstash part. In terms of owning the work going forward, I think Analytics is overloaded at the moment but it makes the most sense there. Extension:Sentry is broadly similar to Extension:EventLogging, and it sounds like EventGate/Kafka is the preferred choice for the pipeline, so that's squarely in our world.

I propose that I work on Extension:Sentry and use @Jdforrester-WMF's help to decide between sentry/browser and a custom implementation. Maybe we could look at sentry/browser and trim it down into some sort of sentry/browser-lite that we could convince upstream to maintain. Surely other people are interested in a client lighter than 50k.

If there are no objections by Monday, I'll reach out to James.

@phuedx can you comment? You probably have a better grasp of the current status.

In T382#5108028, @Milimetric wrote:

Thanks @Tgr, that's a big pivot from what I was expecting, but hey, let's do it!

<3 <3 <3

How/when/who is making the decision on each bullet point?

As yet, this is unclear. @Tgr has done an incredible job breaking down this work and implemented at least one MVP for Multimedia IIRC; I've poked and prodded a bit in T217142; and @fgiunchedi (and SRE) have very recently picked up the proverbial torch.

I could drive client and pipeline work, and beg ops for help with the Sentry server / Logstash part.

SRE seem willing to drive Kafka/Logstash part of the pipeline. IIRC they're looking at Q1-2 FY19-20.

In terms of owning the work going forward, I think Analytics is overloaded at the moment but it makes the most sense there. Extension:Sentry is broadly similar to Extension:EventLogging, and it sounds like EventGate/Kafka is the preferred choice for the pipeline, so that's squarely in our world.

I'm a little nervous about ownership of an infrastructure piece as critical as client-side error logging being shared by more than one team as it could lead to friction when prioritising bugfixes/maintenance/feature requests. In practice, though, this likely won't be so clean cut, e.g. the client-side component could be maintained by Readers Infrastructure in Audiences. Let's talk about this sooner rather than later.

Shipping an MVP by any means makes sense though!

I propose that I work on Extension:Sentry and use @Jdforrester-WMF's help to decide between sentry/browser and a custom implementation. Maybe we could look at sentry/browser and trim it down into some sort of sentry/browser-lite that we could convince upstream to maintain. Surely other people are interested in a client lighter than 50k.

I wonder how many browsers @sentry/browser supports that we don't deliver JavaScript to. One path might be to not go completely custom but to trim out any parts that won't apply to the Wikipedias, if any.

@Milimetric are you still interested in moving this forward?

@daniel sorry I didn't update the group on this, but this project is being led by Filippo and progressing nicely. There's a working version deployed in beta and progress on a production launch is tracked here: T226986. I'm not sure at this point how that interacts with this RfC. Maybe we should update it when we have a good idea what the production stack should look like.

@Milimetric That sounds like this RFC has been overtaken by reality and can be closed as invalid. Or do you think it's still useful?

I suppose the key question is: is an RFC still needed to make sure that there is agreement about the service interface and the backend technology? How where the requirements/constraints gathered, are they documented somewhere? If yes, would it make sense to just put this on last call?

I think that's right, we have a pretty good working group of the people that care about this. If they agree I don't see too much need for an RfC, there would be too much context for someone else to catch up with. So I would vote to close as invalid. As for requirements/constraints, I think the phab tasks describe those for now, and we'll document them as we go forward (there's still thinking/testing on what the stack should be)

FWIW I concur we can close this as invalid, there's good input/ideas in children tasks, some of which we could integrate in T217142: [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors. @Tgr what do you think ?

In T382#5364914, @fgiunchedi wrote:

FWIW I concur we can close this as invalid, there's good input/ideas in children tasks, some of which we could integrate in T217142: [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors. @Tgr what do you think ?

@Tgr: Could you please answer the last comment? Thanks :)

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Tgr updated the task description. (Show Details)Jun 20 2020, 1:21 PM

Tgr closed subtask T91357: Update server-side JS error-logging RfC as Declined.Jun 20 2020, 2:35 PM

The fact this is open and that much of this has been implemented is causing me some confusion. Per T382#6241903 can this be merged into T226986 as that seems to be the canonical link for tracking this work?

This is an RfC, and as such is Stalled awaiting a potential owner. It's not appropriate to merge into T226986: Client side error logging production launch as that's "merely" implementation work rather than a governance decision.

The proposal outlined in this task was about building dedicated infrastructure around error monitoring, with an elaborate client and dedicated server-side and UI component (Sentry).

This has not been actively pursued for couple years now, with a lot of the effort having been shifted toward re-using our existing infrastructure first (statsv, kafka, Logstash/Kibana), with the most recent related RFC being the Modern Event Platform (MEP) - T201963.

This task hasn't been on the active RFC track for 5+ years so I'll close this for now, but feel free to re-open or create a new one at any time.

https://www.mediawiki.org/wiki/Requests_for_comment

Krinkle closed this task as Declined.Jul 15 2020, 8:15 PM

Tgr closed subtask T499: Create error logging JS module as Resolved.Sep 16 2020, 11:45 PM

Tgr closed subtask T501: Create WMF endpoint for error logging - part 1 (producer) as Resolved.Sep 16 2020, 11:49 PM

Tgr closed subtask T502: Create WMF endpoint for error logging - part 2 (consumer) as Resolved.Sep 16 2020, 11:52 PM

Tgr mentioned this in T499: Create error logging JS module.Sep 17 2020, 12:00 AM

Aklapper closed subtask T523: Deduplicate JS error logs as Declined.Oct 4 2020, 12:00 PM

Aklapper closed subtask T522: Add JS error counts to graphite as Declined.

Aklapper closed subtask T521: Make sure JS error logging respects user privacy as Declined.

Aklapper closed subtask T519: Improve error id generation in JS error logging as Declined.

Aklapper closed subtask T514: Collect environment information for JS error logging as Declined.