Page MenuHomePhabricator

normalized_message is a JSON dump of the whole event for exceptions in beta logstash
Closed, ResolvedPublicPRODUCTION ERROR

Description

Seems to affect everything that's type:mediawiki, channel:exception. Breaks the "most frequent" feature on the dashboard.

Event Timeline

Actually not a JSON dump of the event but a JSON dump of the exception: looks like the data from MWExceptionHandler::getStructuredExceptionData.

Having that data in a structured form would actually be quite nice as it would allow e.g. filtering on the exception class. But normalized_message is truncated to some character limit so it's not really useful.

This is caused by the custom normalized_message processor being unable to properly handle the exception-json channel which consists of events with no context array a stringified JSON blob as message.

@bd808 what's the reason we use that format instead of properly putting the data in the context array like the normal exception channel does? Is that for the benefit of the pre-Monolog handler?

@bd808 what's the reason we use that format instead of properly putting the data in the context array like the normal exception channel does? Is that for the benefit of the pre-Monolog handler?

It's stuff that @Krinkle put into core before we had structured logging. I remember using that channel instead of the normal exception channel when ELK was first setup and being fed from the udp2log tap. I thought I switched back to the exception channel in the Logstash config when we got Monolog fully wired into the WMF log stack, but maybe I only meant to and never actually did that?

I thought I switched back to the exception channel in the Logstash config when we got Monolog fully wired into the WMF log stack, but maybe I only meant to and never actually did that?

That channel is not sent to Logstash so probably yes.

Change 323111 had a related patch set uploaded (by Gergő Tisza):
Use 'exception' channel in logstash, kill 'exception-json'

https://gerrit.wikimedia.org/r/323111

Change 323330 had a related patch set uploaded (by Gergő Tisza):
Do not send 'exception-json' channel to logstash

https://gerrit.wikimedia.org/r/323330

Change 323351 had a related patch set uploaded (by BryanDavis):
logstash: Add processing rules for MediaWiki's exception channel

https://gerrit.wikimedia.org/r/323351

Change 323351 merged by Filippo Giunchedi:
logstash: Add processing rules for MediaWiki's exception channel

https://gerrit.wikimedia.org/r/323351

The error-json channel has similar issues, but unlike exception-json there is no dedicated processing so the error data just ends up as a JSON blob.

Change 323111 merged by jenkins-bot:
Send 'exception' channel to logstash

https://gerrit.wikimedia.org/r/323111

Change 323330 merged by jenkins-bot:
Do not send 'exception-json' channel to logstash

https://gerrit.wikimedia.org/r/323330

Mentioned in SAL (#wikimedia-operations) [2017-02-22T00:48:08Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:323111|Send "exception" channel to logstash]] [[gerrit:323330|Do not send "exception-json" channel to logstash]] T136849 (duration: 00m 40s)

exception-json has been replaced with exception on logstash. Example exception. @bd808 the caught_by field has a warning complaining about the lack of mapping. Is that something that needs attention?

old, new. (For posteriority: old P4968, new P4969)

  • normalized_message is fixed, yay!
  • backtrace is replaced by exception.trace. The format is not great, T151290 should add human-readable traces, but maybe we want the old structured format as well? This will break the trending backtrace file visualization.
  • message (and normalized_message) is more cluttered - before it was just the message, now it's the text rendered by the exception (including hash, URL, file/line, exception class). (The "pure" message is available as exception.message.) I don't mind file/line/class but having the URL in there will breaks aggregation.
  • exception class wasn't available before, now it is in exception.class. That's nice and should be added to the dashboards.
  • some fields moved around (file & line -> exception.file, code -> exception.code). Hopefully does not break anything.
  • private flag is gone (should not be missed)

message could be fixed either in LogstashFormatter or in the logstash config. The first is a mildly horrible hack, the second needs to be written in custom symtax and is a lot more effort to test.

It seems this task is resolved. normalized_message is no longer a JSON blob given we now use the exception and error channels from MediaWiki directly.

Change 494042 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] logstash: Remove filter for unused 'exception-json' channel

https://gerrit.wikimedia.org/r/494042

Change 494042 merged by Herron:
[operations/puppet@production] logstash: Remove filter for unused 'exception-json' channel

https://gerrit.wikimedia.org/r/494042

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM