Page MenuHomePhabricator

Standardise on Logstash field for exceptions with back traces
Closed, ResolvedPublic

Description

We currently have at least four different ways of logging the exception trace when logging messages to Logstash.

  1. For fatal errors (channel:fatal), we use fatal_exception => Exception $e which results in fields like fatal_exception.message, fatal_exception.trace, fatal_exception.class, etc.
  1. For uncaught exceptions (channel:exception), we use exception => Exception $e which results in fields like exception.message, exception.trace, etc.
  1. For failed DeferredUpdates (channel:DeferredUpdates), we use [ 'message' => …, 'trace' => ], which results in fields like c_message, trace, etc. (c_message because "message" is a reserved key).
  1. For various other ad-hoc logging we seem to use [ 'trace' => … ] mostly.

Problem: This currently means that Logstash dashboards for individual teams either have to repeat their searches several times (which is quite messy to do in Kibana last time I tried), or end up missing some variants.

Keeping ad-hoc logging separate might be okay as they're not usually errors but rather traces we tacked onto debug messages that otherwise wouldn't have a trace. We're already consistent with those so they would only require 1 duplicate search to include.

Desired outcome: Decide which way we want to log exceptions, for the first three cases above.

Proposal 1: Use exception, exception.trace etc.

Sub tasks:

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/PageTriagemaster+1 -1
mediawiki/extensions/VisualEditormaster+2 -2
mediawiki/coremaster+14 -13
mediawiki/coremaster+3 -3
mediawiki/extensions/Flowmaster+1 -1
mediawiki/coremaster+5 -4
mediawiki/extensions/AbuseFiltermaster+1 -1
mediawiki/coremaster+0 -2
mediawiki/coremaster+14 -10
mediawiki/coremaster+2 -12
mediawiki/coreREL1_32+4 -4
mediawiki/coremaster+1 -1
mediawiki/coreREL1_31+4 -4
mediawiki/coreREL1_33+6 -6
mediawiki/coreREL1_34+7 -7
mediawiki/coremaster+7 -7
mediawiki/corewmf/1.35.0-wmf.8+7 -7
mediawiki/corewmf/1.35.0-wmf.5+1 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Anomie subscribed.
  1. For various other ad-hoc logging we seem to use [ 'trace' => … ] mostly.

For various ad-hoc logging I typically use [ 'exception' => Exception $e ]. Overall that seems to be the more common, too.

  1. For various other ad-hoc logging we seem to use [ 'trace' => … ] mostly.

For various ad-hoc logging I typically use [ 'exception' => Exception $e ]. Overall that seems to be the more common, too.

I didn't think to look for that, thanks. Hadn't seen that one before. Core has 31 uses of trace, and 40 uses of exception (39 if not including the one for the exception channel).

Change 549645 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: Log fatal exception data as 'exception' instead of 'fatal_exception'

https://gerrit.wikimedia.org/r/549645

Change 549645 merged by jenkins-bot:
[mediawiki/core@master] exception: Log fatal exception data as 'exception' instead of 'fatal_exception'

https://gerrit.wikimedia.org/r/549645

Change 553182 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] deferred: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/553182

Change 553372 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.35.0-wmf.5] exception: Log fatal exception data as 'exception' instead of 'fatal_exception'

https://gerrit.wikimedia.org/r/553372

Change 553372 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.5] exception: Log fatal exception data as 'exception' instead of 'fatal_exception'

https://gerrit.wikimedia.org/r/553372

Confirmed via mwdebug1001. Messages in channel:fatal having caught_by: mwe_handler (not php-wmerrors), now use exception.* instead of fatal_exception. Thus matching what we do for other uncaught exceptions and what php-wmerrors uses for simpler dashboarding.

Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)

Change 554617 had a related patch set uploaded (by SBassett; owner: SBassett):
[mediawiki/core@master] Log fatal exception data as 'exception' instead of 'trace'

https://gerrit.wikimedia.org/r/554617

Change 554637 had a related patch set uploaded (by Krinkle; owner: SBassett):
[mediawiki/core@wmf/1.35.0-wmf.8] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/554637

Change 554617 merged by jenkins-bot:
[mediawiki/core@master] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/554617

Krinkle triaged this task as Medium priority.Dec 5 2019, 12:00 AM

Change 554637 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.8] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/554637

Mentioned in SAL (#wikimedia-operations) [2019-12-05T22:00:47Z] <krinkle@deploy1001> Synchronized php-1.35.0-wmf.8/includes/libs/rdbms/database/: T233342 (duration: 01m 02s)

Change 556289 had a related patch set uploaded (by Reedy; owner: SBassett):
[mediawiki/core@REL1_34] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556289

Change 556291 had a related patch set uploaded (by Reedy; owner: SBassett):
[mediawiki/core@REL1_33] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556291

Change 556292 had a related patch set uploaded (by Reedy; owner: SBassett):
[mediawiki/core@REL1_32] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556292

Change 556293 had a related patch set uploaded (by Reedy; owner: SBassett):
[mediawiki/core@REL1_31] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556293

Change 556289 merged by jenkins-bot:
[mediawiki/core@REL1_34] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556289

Change 556293 merged by jenkins-bot:
[mediawiki/core@REL1_31] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556293

Change 556291 merged by jenkins-bot:
[mediawiki/core@REL1_33] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556291

Change 556509 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: Remove ErrorException workaround in handleFatalError()

https://gerrit.wikimedia.org/r/556509

Change 556292 merged by jenkins-bot:
[mediawiki/core@REL1_32] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/556292

Change 556509 merged by jenkins-bot:
[mediawiki/core@master] exception: Remove ErrorException workaround in handleFatalError()

https://gerrit.wikimedia.org/r/556509

Change 553182 merged by jenkins-bot:
[mediawiki/core@master] deferred: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/553182

I'm not sure I understand the proposal for the "ad-hoc" case, as in

For various other ad-hoc logging we seem to use [ 'trace' => … ] mostly.

What should replace this with? This is not reporting an exception. The idea is to log a warning and include a stack trace as context. Reporting this as exception.trace would be misleading. I think the 'trace' key is fine for this use case.

I'm not sure I understand the proposal for the "ad-hoc" case, as in

For various other ad-hoc logging we seem to use [ 'trace' => … ] mostly.

What should replace this with?

With exception. And various ad-hoc code paths have already been updated this way from the above patches (e.g. rdbms).

Task description

Proposal: Use exception, exception.trace etc.

This is not reporting an exception. The idea is to log a warning and include a stack trace as context. Reporting this as exception.trace would be misleading.

It won't make it reported as an uncaught exception. It's just the name of UI field inside Kibana. Creating new RuntimeException to call getTraceAsString on directly is no more real an exception than creating it and passing it to Logger to handle deal format accordingly.

I think the 'trace' key is fine for this use case.

The problem is that stack traces need to be sanitized as they tend to be copy-pasted to public tasks (T234014). Doing that ad-hoc is a waste of our time, and also will inevitably lead to incidents. The other thing is that by having a consistent key, Kibana almost becomes usable and might make it so that teams can actually be effective in maintaining a dashboard to look our for events relevant to code they maintain. Which channels they monitor is their own choice, but the filter for matching stack traces will be universal (in this case, exception.trace).

Change 615839 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/AbuseFilter@master] Avoid use of unfiltered 'getTraceAsString' in debug logs

https://gerrit.wikimedia.org/r/615839

Change 615867 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] context: Remove duplicate log message from RequestContext::getLanguage()

https://gerrit.wikimedia.org/r/615867

Change 615869 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/615869

Change 615867 merged by jenkins-bot:
[mediawiki/core@master] context: Remove duplicate log message from RequestContext::getLanguage()

https://gerrit.wikimedia.org/r/615867

Change 615839 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Avoid use of unfiltered 'getTraceAsString' in debug logs

https://gerrit.wikimedia.org/r/615839

Change 615869 merged by jenkins-bot:
[mediawiki/core@master] rdbms: Log debug message traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/615869

Change 676440 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Flow@master] Import: Avoid unredacted trace in OptInController log warning

https://gerrit.wikimedia.org/r/676440

Change 676440 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Import: Avoid unredacted trace in OptInController log warning

https://gerrit.wikimedia.org/r/676440

Change 721645 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Revision: Log traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/721645

Change 779960 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] rdbms: Log trace as `exception.trace` instead of `trace`

https://gerrit.wikimedia.org/r/779960

Change 779960 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Log trace as `exception.trace` instead of `trace`

https://gerrit.wikimedia.org/r/779960

Change 721645 merged by jenkins-bot:

[mediawiki/core@master] Revision: Log traces as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/721645

Change 791803 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/VisualEditor@master] ApiParsoidTrait: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/791803

Change 791804 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/PageTriage@master] ArticleCompile: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/791804

Change 791803 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] ApiParsoidTrait: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/791803

Change 791804 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] ArticleCompile: Log trace as 'exception.trace' instead of 'trace'

https://gerrit.wikimedia.org/r/791804

Krinkle claimed this task.
Krinkle removed a project: Patch-For-Review.
Krinkle updated the task description. (Show Details)
Krinkle edited projects, added Performance-Team; removed Performance-Team (Radar).
Krinkle added a subscriber: Reedy.