Page MenuHomePhabricator

Documentation of client side error logging capabilities on mediawiki
Closed, ResolvedPublic

Description

We have launched the client side error logging client and server code. Errors are flowing in and now this needs to be documented so developers know this functionality is available. @Tgr
suggested documenting things this way:

Apart from inline docs, I'd mention the core part at mw:Manual:How to debug#Logging and mw:Help:Locating broken scripts and mw:ResourceLoader/Core modules and mw:Requests for comment/Server-side Javascript error logging, and the Wikimedia-specific part is probably worth its own page on wikitech.

ping @Ottomata @jlinehan @fgiunchedi

Event Timeline

Milimetric moved this task from Incoming to Radar on the Analytics board.

assigned for documentation, ping @phuedx as well

I was hoping to add a Web team chore or dashboard to review errors relevant to the products we steward before any code is merged for T244392. My apologies if this isn't the appropriate place to make an inquiry!

It's my understanding from chatting with @Ottomata that Logstash is likely the mechanism devs will use to track error trends and debug specific errors. We're about to start on significant client side changes and I was wondering if it would be possible to filter out certain configurations from the report. What's most relevant to us now, I think, would be a way to at least filter out reports from certain skins (e.g., BlueSpice). In general, it would be helpful if the documentation could cover some use cases for how devs were expected to monitor error trends relevant to their projects.

It could be that the initial implementation only supports reporting all errors, and that's fine but documentation that clarifies limitations is useful too It's really exciting to have these reports but I'm not sure how best to use them in my day to day yet.

@Niedzielski I think is worth syncing up with members of product infrastructure team on your end, @jlinehan can help.

Probably the first thing the product team could do is a general cleanup of errors as there are quite a few that will polute any deeper dives you might want to do for a particular project: https://logstash.wikimedia.org/app/kibana#/dashboard/AXDBY8Qhh3Uj6x1zCF56?_g=h@1251ff0&_a=h@4f171a4

Thanks, @Nuria! The link you shared says it's unable to be completely restored but hopefully that's just dates and such. I've added T249826 to track any outputs needed.

@Niedzielski If you go to log stash home page you can see a link to mediawiki frontend error dashboard

Screen Shot 2020-04-09 at 7.59.31 AM.png (1×2 px, 354 KB)

Any ETA when this will get done? (cc @dcipoletti ) Error logging should be quite important for the upcoming desktop refresh (cc @Jdlrobson so he knows this is available as a tool)

Thanks @Nuria we're aware of this and actively using it in our error monitoring. I'm actually working on a blog post about how its helping us. @jlinehan would you or anyone be interested in collaborating on that?

@jlinehan would you or anyone be interested in collaborating on that?

Sure, I'd love to help!

Change 629448 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventgate-wikimedia@master] [WIP] Choose HTTP header defaults to set based on schema properties

https://gerrit.wikimedia.org/r/629448

I just wanted to mention that I'm currently wishing for this documentation (especially the Wikimedia-specific part). I'm trying to understand whether the Language team still needs the ContentTranslationError data stream, and it would help to know the capabilities of this new error-logging method and whether, perhaps, JS errors in ContentTranslation are already being captured.

cc @santhosh - my guess is ContentTranslationError data stream can be removed now.

I just wanted to mention that I'm currently wishing for this documentation (especially the Wikimedia-specific part). I'm trying to understand whether the Language team still needs the ContentTranslationError data stream, and it would help to know the capabilities of this new error-logging method and whether, perhaps, JS errors in ContentTranslation are already being captured.

Yeah could you point me to the code where those get generated? If they're JS errors then they are being captured, but the fields might not be as accessible.

Yes, documenting this is something I'll try to prioritize better, as I've been saying up until now, while @Jdlrobson and I have been scaling things up, we keep learning things and so I plan to document once the rate of us learning new things slows down. But we can still chat about it and make a determination w/r/t ContentTranslationError.

I documented alerts here: https://wikitech.wikimedia.org/wiki/Client_errors - perhaps that be expanded with additionals?

Removing inactive assignee from this open task. (Please update assignees on open tasks after offboarding. Thanks.)

Tgr claimed this task.

Updated the docs.