normalized_message is a JSON dump of the whole event for exceptions in beta logstash
Closed, ResolvedPublic

Description

Seems to affect everything that's type:mediawiki, channel:exception. Breaks the "most frequent" feature on the dashboard.

Tgr created this task.Jun 2 2016, 5:29 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 2 2016, 5:29 PM

Actually not a JSON dump of the event but a JSON dump of the exception: looks like the data from MWExceptionHandler::getStructuredExceptionData.

Tgr added a comment.Jun 7 2016, 1:33 PM

Having that data in a structured form would actually be quite nice as it would allow e.g. filtering on the exception class. But normalized_message is truncated to some character limit so it's not really useful.

greg added a subscriber: greg.Jul 18 2016, 7:16 PM
Tgr added a subscriber: bd808.Nov 22 2016, 3:29 AM

This is caused by the custom normalized_message processor being unable to properly handle the exception-json channel which consists of events with no context array a stringified JSON blob as message.

@bd808 what's the reason we use that format instead of properly putting the data in the context array like the normal exception channel does? Is that for the benefit of the pre-Monolog handler?

Not specific to the beta cluster.

bd808 added a subscriber: Krinkle.Nov 22 2016, 6:47 AM

@bd808 what's the reason we use that format instead of properly putting the data in the context array like the normal exception channel does? Is that for the benefit of the pre-Monolog handler?

It's stuff that @Krinkle put into core before we had structured logging. I remember using that channel instead of the normal exception channel when ELK was first setup and being fed from the udp2log tap. I thought I switched back to the exception channel in the Logstash config when we got Monolog fully wired into the WMF log stack, but maybe I only meant to and never actually did that?

Tgr added a comment.Nov 23 2016, 2:08 AM

I thought I switched back to the exception channel in the Logstash config when we got Monolog fully wired into the WMF log stack, but maybe I only meant to and never actually did that?

That channel is not sent to Logstash so probably yes.

Change 323111 had a related patch set uploaded (by Gergő Tisza):
Use 'exception' channel in logstash, kill 'exception-json'

https://gerrit.wikimedia.org/r/323111

Change 323330 had a related patch set uploaded (by Gergő Tisza):
Do not send 'exception-json' channel to logstash

https://gerrit.wikimedia.org/r/323330

Change 323351 had a related patch set uploaded (by BryanDavis):
logstash: Add processing rules for MediaWiki's exception channel

https://gerrit.wikimedia.org/r/323351

Change 323351 merged by Filippo Giunchedi:
logstash: Add processing rules for MediaWiki's exception channel

https://gerrit.wikimedia.org/r/323351

Tgr added a comment.Dec 30 2016, 11:41 PM

The error-json channel has similar issues, but unlike exception-json there is no dedicated processing so the error data just ends up as a JSON blob.

Change 323111 merged by jenkins-bot:
Send 'exception' channel to logstash

https://gerrit.wikimedia.org/r/323111

Change 323330 merged by jenkins-bot:
Do not send 'exception-json' channel to logstash

https://gerrit.wikimedia.org/r/323330

Mentioned in SAL (#wikimedia-operations) [2017-02-22T00:48:08Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:323111|Send "exception" channel to logstash]] [[gerrit:323330|Do not send "exception-json" channel to logstash]] T136849 (duration: 00m 40s)

Tgr added a comment.Feb 22 2017, 12:50 AM

exception-json has been replaced with exception on logstash. Example exception. @bd808 the caught_by field has a warning complaining about the lack of mapping. Is that something that needs attention?

Tgr added a comment.EditedFeb 22 2017, 1:22 AM

old, new. (For posteriority: old P4968, new P4969)

  • normalized_message is fixed, yay!
  • backtrace is replaced by exception.trace. The format is not great, T151290 should add human-readable traces, but maybe we want the old structured format as well? This will break the trending backtrace file visualization.
  • message (and normalized_message) is more cluttered - before it was just the message, now it's the text rendered by the exception (including hash, URL, file/line, exception class). (The "pure" message is available as exception.message.) I don't mind file/line/class but having the URL in there will breaks aggregation.
  • exception class wasn't available before, now it is in exception.class. That's nice and should be added to the dashboards.
  • some fields moved around (file & line -> exception.file, code -> exception.code). Hopefully does not break anything.
  • private flag is gone (should not be missed)
Tgr added a comment.Feb 22 2017, 3:36 AM

message could be fixed either in LogstashFormatter or in the logstash config. The first is a mildly horrible hack, the second needs to be written in custom symtax and is a lot more effort to test.

Krinkle closed this task as Resolved.Aug 30 2017, 9:32 PM

It seems this task is resolved. normalized_message is no longer a JSON blob given we now use the exception and error channels from MediaWiki directly.

Tgr moved this task from Backlog to Done on the User-Tgr board.Nov 13 2017, 7:20 AM