Page MenuHomePhabricator

RfC: Server-side Javascript error logging
Closed, DeclinedPublic

Description

WARNING: The plan outlined her is obsolete, or at best, indefinitely stalled. The current focus is on a simpler JS error logging system without Sentry, which is outlined in T226986: Client side error logging production launch.

Main document: Server-side Javascript error logging RfC.

Implementation steps:

  1. T525: Review existing JS error logging solutions
  2. T499: Create error logging JS module
  3. T500: Create basic endpoint for JS error logging (MVP for vanilla MW)
  4. T501: Create WMF endpoint for error logging - part 1 (producer)
  5. T502: Create WMF endpoint for error logging - part 2 (consumer) (experimental MVP for Wikimedia)
  6. T526: Add sampling and throttling support to JS error logging
  7. T521: Make sure JS error logging respects user privacy (stable MVP for Wikimedia)
  8. T519: Improve error id generation in JS error logging
  9. T514: Collect environment information for JS error logging
  10. T512: Deal with some browsers providing less details for JS error logging

Harder / experimental stuff:

  1. T520: Deal with minified scripts in JS error logging
  2. T507: Measure how many users have CORS-hostile proxies
  3. T508: Use CORS-enabled fetch of scripts to avoid same-domain limitations in JS error logging
  4. T513: Wrap scripts with exception handling for automatic JS error logging

(Vague) interface ideas:

  1. T522: Add JS error counts to graphite
  2. T523: Deduplicate JS error logs
  3. T524: Interface to display JS error logs

Details

Reference
fl575

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
DeclinedTgr
OpenNone
DeclinedNone
Resolvedjlinehan
DuplicateNone
ResolvedOttomata
ResolvedNone
OpenNone
ResolvedTgr
DeclinedNone
StalledNone
DeclinedNone
DeclinedNone
OpenNone
OpenNone
OpenNone
StalledNone
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedTgr
ResolvedGilles
ResolvedTgr
ResolvedTgr
Resolvedcsteipp
ResolvedTgr
DeclinedTgr
DeclinedTgr
DeclinedTgr
ResolvedTgr
DeclinedTgr
ResolvedTgr
ResolvedTgr
ResolvedTgr
ResolvedTgr
Resolvedjcrespo
ResolvedAklapper
ResolvedTgr
ResolvedTgr
OpenNone
DeclinedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Qgil edited projects, added TechCom-RFC; removed Architecture.Oct 22 2014, 8:45 PM
Gilles moved this task from Untriaged to Prototyping on the Multimedia board.Nov 24 2014, 3:57 PM
Tgr added a project: Epic.Dec 21 2014, 4:30 PM
Tgr added a comment.Dec 30 2014, 7:32 AM

Some interesting stuff from this presentation:

Tgr removed Tgr as the assignee of this task.Jan 30 2015, 2:38 AM
daniel assigned this task to brion.Feb 25 2015, 8:14 PM
daniel added a subscriber: daniel.
Spage moved this task from P1: Define to Old on the TechCom-RFC board.Feb 26 2015, 6:51 PM
Spage removed brion as the assignee of this task.Mar 26 2015, 12:49 AM
Spage added a subscriber: brion.

It sounds like @Tgr is working on improving the RFC in T91357: Update server-side JS error-logging RfC, so assigning this to him. Tgr, when you're done move this task to "under discussion" on TechCom-RFC and the architecture committee will probably move to "approved" \o/ (The rest of the blockers are for implementation.)

Spage assigned this task to Tgr.Apr 6 2015, 8:16 AM
Gilles moved this task from Prototyping to Untriaged on the Multimedia board.Apr 6 2015, 9:23 AM
Prtksxna removed a subscriber: Prtksxna.Apr 6 2015, 9:49 AM
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 16 2015, 5:09 PM
Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:44 PM

It looks like a large number of the blockers here have been left without any projects.

daniel added a subscriber: Jonas.
Tgr added a comment.Sep 25 2015, 5:35 PM

Added them to Sentry.

Qgil removed a subscriber: Qgil.Sep 30 2015, 8:33 PM
RobLa-WMF lowered the priority of this task from High to Medium.May 25 2016, 5:32 AM
RobLa-WMF raised the priority of this task from Medium to High.

Hey @Tgr, I'm going to work on this as a side project, to get more familiar with mediawiki. I'm going to read up on the status in the various subtasks, let me know if there's somewhere obvious I should start.

Tgr added a comment.EditedApr 12 2019, 3:50 AM

@Milimetric cool! Note though that all this is very outdated and currently most of the work is not in MediaWiki. I wrote a more current summary in T217142#5103038.

Thanks @Tgr, that's a big pivot from what I was expecting, but hey, let's do it! How/when/who is making the decision on each bullet point? I could drive client and pipeline work, and beg ops for help with the Sentry server / Logstash part. In terms of owning the work going forward, I think Analytics is overloaded at the moment but it makes the most sense there. Extension:Sentry is broadly similar to Extension:EventLogging, and it sounds like EventGate/Kafka is the preferred choice for the pipeline, so that's squarely in our world.

I propose that I work on Extension:Sentry and use @Jdforrester-WMF's help to decide between sentry/browser and a custom implementation. Maybe we could look at sentry/browser and trim it down into some sort of sentry/browser-lite that we could convince upstream to maintain. Surely other people are interested in a client lighter than 50k.

If there are no objections by Monday, I'll reach out to James.

Tgr added a subscriber: phuedx.Apr 12 2019, 8:46 PM

@phuedx can you comment? You probably have a better grasp of the current status.

Thanks @Tgr, that's a big pivot from what I was expecting, but hey, let's do it!

<3 <3 <3

How/when/who is making the decision on each bullet point?

As yet, this is unclear. @Tgr has done an incredible job breaking down this work and implemented at least one MVP for Multimedia IIRC; I've poked and prodded a bit in T217142; and @fgiunchedi (and SRE) have very recently picked up the proverbial torch.

I could drive client and pipeline work, and beg ops for help with the Sentry server / Logstash part.

SRE seem willing to drive Kafka/Logstash part of the pipeline. IIRC they're looking at Q1-2 FY19-20.

In terms of owning the work going forward, I think Analytics is overloaded at the moment but it makes the most sense there. Extension:Sentry is broadly similar to Extension:EventLogging, and it sounds like EventGate/Kafka is the preferred choice for the pipeline, so that's squarely in our world.

I'm a little nervous about ownership of an infrastructure piece as critical as client-side error logging being shared by more than one team as it could lead to friction when prioritising bugfixes/maintenance/feature requests. In practice, though, this likely won't be so clean cut, e.g. the client-side component could be maintained by Readers Infrastructure in Audiences. Let's talk about this sooner rather than later.

Shipping an MVP by any means makes sense though!

I propose that I work on Extension:Sentry and use @Jdforrester-WMF's help to decide between sentry/browser and a custom implementation. Maybe we could look at sentry/browser and trim it down into some sort of sentry/browser-lite that we could convince upstream to maintain. Surely other people are interested in a client lighter than 50k.

I wonder how many browsers @sentry/browser supports that we don't deliver JavaScript to. One path might be to not go completely custom but to trim out any parts that won't apply to the Wikipedias, if any.

@Milimetric are you still interested in moving this forward?

@daniel sorry I didn't update the group on this, but this project is being led by Filippo and progressing nicely. There's a working version deployed in beta and progress on a production launch is tracked here: T226986. I'm not sure at this point how that interacts with this RfC. Maybe we should update it when we have a good idea what the production stack should look like.

@Milimetric That sounds like this RFC has been overtaken by reality and can be closed as invalid. Or do you think it's still useful?

I suppose the key question is: is an RFC still needed to make sure that there is agreement about the service interface and the backend technology? How where the requirements/constraints gathered, are they documented somewhere? If yes, would it make sense to just put this on last call?

I think that's right, we have a pretty good working group of the people that care about this. If they agree I don't see too much need for an RfC, there would be too much context for someone else to catch up with. So I would vote to close as invalid. As for requirements/constraints, I think the phab tasks describe those for now, and we'll document them as we go forward (there's still thinking/testing on what the stack should be)

FWIW I concur we can close this as invalid, there's good input/ideas in children tasks, some of which we could integrate in T217142: [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors. @Tgr what do you think ?

FWIW I concur we can close this as invalid, there's good input/ideas in children tasks, some of which we could integrate in T217142: [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors. @Tgr what do you think ?

@Tgr: Could you please answer the last comment? Thanks :)

Aklapper removed Tgr as the assignee of this task.Jun 19 2020, 4:19 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Tgr updated the task description. (Show Details)Jun 20 2020, 1:21 PM

The fact this is open and that much of this has been implemented is causing me some confusion. Per T382#6241903 can this be merged into T226986 as that seems to be the canonical link for tracking this work?

Jdforrester-WMF changed the task status from Open to Stalled.Jul 9 2020, 5:02 PM

This is an RfC, and as such is Stalled awaiting a potential owner. It's not appropriate to merge into T226986: Client side error logging production launch as that's "merely" implementation work rather than a governance decision.

The proposal outlined in this task was about building dedicated infrastructure around error monitoring, with an elaborate client and dedicated server-side and UI component (Sentry).

This has not been actively pursued for couple years now, with a lot of the effort having been shifted toward re-using our existing infrastructure first (statsv, kafka, Logstash/Kibana), with the most recent related RFC being the Modern Event Platform (MEP) - T201963.

This task hasn't been on the active RFC track for 5+ years so I'll close this for now, but feel free to re-open or create a new one at any time.

https://www.mediawiki.org/wiki/Requests_for_comment

Krinkle closed this task as Declined.Jul 15 2020, 8:15 PM