Page MenuHomePhabricator

Log suppressed errors with level=DEBUG
Closed, ResolvedPublic

Description

We currently discard errors that are suppressed via error_reporting(). That's problematic when debugging, as sometimes those are legitimate errors that are causing the operation to fail. They should be logged with level=DEBUG instead.

Current status
  • Logging for suppressed error has worked for years, but goes to the error-json channel instead of error. We want to deprecate that channel so that its messages can be consumed during local development (and by third parties) in a way that is compatible and similar to all other ways we log messages from MediaWiki - not in a custom json format. This will also allow us to remove custom configuration from out Logstash filter in Puppet, which currently exists only for this channel.
  • Before we can move message from error-json to error (as level=DEBUG) we first need to have a way to easily configure wgDebugLogGroups (or something like that) on a plain MW install in a way that only puts messages in a given file for a given channel above a given threshold. Thus enabling developers and CI to keep this file limited to important errors. In particular because CI enforces error-free by asserting that error.log stays empty.
  • These errors currently log at their original (unsuppressed) severity level, which means they show up in general MediaWiki-Logstash queries for errors, which can be confusing.

Event Timeline

Change 338901 had a related patch set uploaded (by Gergő Tisza; owner: Krinkle):
[mediawiki/core@master] exception: Add suppressed errors to "error" channel as level=DEBUG

https://gerrit.wikimedia.org/r/338901

The patch breaks CI which checks that nothing gets logged on the error channel (https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit-jessie/62535/console). Not sure if there's an easy way to fix that in CI config, or we need to reconsider the decision to use severity levels instead of a dedicated channel.

CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 1:53 AM

@Tgr The task as written appears to be working as expected. We currently, and have for years, collect the "suppressed errors" in production as well. They just go to a different channel (error-json, also stored in Logstash).

As stated on Gerrit, I do agree with the direction of merging the channels and using severity levels instead. But that's currently blocked on easily making that work out of the box for third-parties as well as our CI requirements.

Current WMF config discards error-json entirely.

I see your point about this change being problematic with the default config (it does seem to work fine for our CI now). Does that mean this is blocked on T155552: Make Monolog the default debug processing layer and deprecate wfDebug* and LegacyLogger? Or should we have an error-supressed channel instead of just severities, which can be handled easily with the default setup?

@Tgr That seems fine by me as a first step (separate channel).

@hashar Yeah, although it's not a "bug". This was designed on purpose, and T228838 is about changing that. For this task, if we want to collect the "error-json" channel in production, we can just enable that in wmf-config. We do all the time when introducing new channels, no big deal.

This task is about changing AtEase messages to not go into "error-json" anymore, but instead go into the main "error" channel with "level=DEBUG". Making that change is currently blocked because of two things:

  • We want to strictly prevent commits from merging that cause errors. Which is why Jenkins fails commits with a non-empty "error" channel file. If we start sending errors that were intentionally handled ("caught") into that file as well, the build would fail (and we see this already at https://gerrit.wikimedia.org/r/338901).
  • In production, we use Monolog to configure for each channel what the minimum severity level is to decide whether to send a message to a file/Logstash. We probably need to make this part of MediaWiki core by default in an easy to use way, so that we can update CI config from $wgDebugLogGroups['error'] = mw-error.log to something like $wgDebugLogGroups['error'] !? severity=INFO !? … mw-error.log. This is not limited to CI, it would help with general development and for third-party sysadmins as well. Right now the way we configure log channels in production requires a ton of complex configuration that I don't think is reasonable to expect someone to whip up in LocalSettings.php themselves.
Krinkle triaged this task as Medium priority.Aug 27 2019, 2:22 PM

@Tgr Right now, "error-json" is not enabled in wmf-config which means, except for X-Wikimedia-Debug/log requests, these are not sent to Logstash. Do you think their volume is low enough to be acceptable unconditionally? Given they usually caught/suppressed on purpose, it would mean that the 99% of requests (guesstimate) that currently don't report anything to Logstash, would have at least one or two messages at minimum under normal conditions. Years ago when we sometimes introduced something like a PHP error caused by a typo in wmf-config, it led to Logstash problems, but, maybe things are better now?

Change 553528 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: Use level=DEBUG for error-json (surpressed errors)

https://gerrit.wikimedia.org/r/553528

Krinkle added a subscriber: Anomie.

Change 553528 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: Use level=DEBUG for error-json (surpressed errors)

https://gerrit.wikimedia.org/r/553528

@Anomie Could you review this one? This would make the error-json messages about intentionally surpressed errors arrive in Logstash with level:DEBUG instead of e.g. WARNING. This in turn helps reduce noise in team and product level dashboards in Kibana by being able to simply query for type:mediawiki exception.trace:something -level:DEBUG|INFO and not need any awareness of e.g. "erorr-json" being a channel to ignore.

Change 553528 merged by jenkins-bot:
[mediawiki/core@master] exception: Use level=DEBUG for error-json (surpressed errors)

https://gerrit.wikimedia.org/r/553528

@Anomie Could you review this one?

Looks like Gergő beat me to it.

Change 338901 had a related patch set uploaded (by Gergő Tisza; author: Krinkle):

[mediawiki/core@master] exception: Add suppressed errors to "error" channel as level=DEBUG

https://gerrit.wikimedia.org/r/338901

Change 967679 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] logging: Raise 'error' channel threshold to info

https://gerrit.wikimedia.org/r/967679

Change 338901 merged by jenkins-bot:

[mediawiki/core@master] exception: Send silenced errors to a new log channel with level=DEBUG

https://gerrit.wikimedia.org/r/338901

Change 967679 abandoned by Gergő Tisza:

[operations/mediawiki-config@master] logging: Raise 'error' channel threshold to info

Reason:

I34b19837c ended up with a different implementation which doesn't need this.

https://gerrit.wikimedia.org/r/967679

Before we can move message from error-json to error (as level=DEBUG) we first need to have a way to easily configure wgDebugLogGroups (or something like that) on a plain MW install in a way that only puts messages in a given file for a given channel above a given threshold.

According to its documentation, $wgDebugLogGroups has a level option already, so I think this is already resolved? All that would be left then is cleaning up the error-json channel. Do we want to follow a deprecation process, or just do it? Adding a release note about it going to be removed in the next release doesn't have much benefit over just removing it and adding a release note about that, and I'm not what else a deprecation could involve. (I guess we could log a warning if the channel is configured in $wgDebugLogGroups? Doesn't seem terribly useful.)

MWExceptionHandler::logError currently (as of 1.41.1) invokes MWExceptionHandler::jsonSerializeException regardless of whether the error is suppressed by for example AtEase::suppressWarnings. It will in turn invoke FormatJson::encode to serialize the exception. Third-party installations of mediawiki are probably never configured to print the "error-json" log channel. The time spent for json serialization is therefore wasted.

I feel that this feature appears to be specific to WMF's log collection setup. It may make more sense to keep it behind some configuration flag.

Change #1030999 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] exception: Remove "error-json" debug log channel

https://gerrit.wikimedia.org/r/1030999

Change #1030999 merged by jenkins-bot:

[mediawiki/core@master] exception: Remove "error-json" debug log channel

https://gerrit.wikimedia.org/r/1030999

Krinkle removed a project: Patch-For-Review.

@Bewfip Thanks for the reminder!

The JSON serialisation is not likely to be of note given the rarity of these runtime warnings. MediaWiki comes with dozens of other debug channels that are all disabled by default, for which strings and context data is also partly created and then discarded.

https://codesearch.wmcloud.org/core/?q=log.*-%3Edebug%5C%28&files=php&excludeFiles=test&repos=

However, you are of course absolutely right that this channel is no longer needed. This task exists specifically for the purpose of removing the old error-json channel. The required steps to make that possible have been completed now, so the last step of actually removing it, can now be done as well. I've done so!