Page MenuHomePhabricator

Monolog throws RuntimeException on JSON encoding failure
Closed, ResolvedPublic

Description

When we upgraded Monolog to 1.17.2 we found an unwanted behavior change in the Logstash formatter. This formatter uses json_encode to format the log record for Logstash consumption. json_encode requires that "All string data must be UTF-8 encoded." When this is violated it will return false instead of an encoded string. Monolog had changed the behavior of their encoder to raise a RuntimeException when this happens rather than just blindly passing the boolean on to the output layer.

This behavior change is not nice for our environment (or any environment arguably) as even attempting to log non-utf-8 data can now trigger a RuntimeException at the point of logging.

Event Timeline

bd808 created this task.Nov 7 2015, 1:10 AM
bd808 raised the priority of this task from to Unbreak Now!.
bd808 updated the task description. (Show Details)
bd808 added a subscriber: bd808.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2015, 1:10 AM

Change 251679 had a related patch set uploaded (by BryanDavis):
Add wrapper class for Monolog\Formatter\LogstashFormatter

https://gerrit.wikimedia.org/r/251679

Krinkle added a subscriber: Krinkle.Nov 8 2015, 8:54 PM

Can you reference an upstream issue and/or commit?

bd808 claimed this task.Nov 8 2015, 9:14 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptNov 8 2015, 9:14 PM

Upstream introduced the exception throwing behavior in commit 0c8a92a to "fix" bug 545. Previously a json_encode failure in Monolog\Formatter\LogstashFormatter would return a boolean false as the formatted message to the Handler.

Most of the errors I'm still seeing related to this bug are triggered by user-agents or urls that are latin1 encoded and included in the log message via concatenation. zero.log was hitting this quite a bit so I dropped it from Logstash storage for now. We are down to <100 failures a day attributable to this in exception.log now so it's not a huge problem, but those failures are fatal errors for users so we should correct the problem as soon as we can.

Change 252359 had a related patch set uploaded (by BryanDavis):
Monolog: wrap channel handlers in a WhatFailureGroupHandler

https://gerrit.wikimedia.org/r/252359

Change 251679 abandoned by BryanDavis:
Add wrapper class for Monolog\Formatter\LogstashFormatter

Reason:
I3839578ffbb42907e920f88297a752627d1c8920 is a better short term approach to this problem. We can swallow the errors with that just as the old upstream code did and then work with upstream on a less dramatic way of rescuing messages that hold bad UTF-8 input.

https://gerrit.wikimedia.org/r/251679

Having some discussion of the general problem upstream: https://github.com/Seldaek/monolog/issues/545

Initial advice is to use WhatFailureGroupHandler to silently discard exceptions from handle().

Change 252359 merged by jenkins-bot:
Monolog: wrap channel handlers in a WhatFailureGroupHandler

https://gerrit.wikimedia.org/r/252359

bd808 lowered the priority of this task from Unbreak Now! to Medium.Nov 12 2015, 11:10 PM

Lowering from UBN! to normal. The WhatFailureGroupHandler fix gets us back to the previous status quo of log messages with invalid UTF-8 content being silently dropped rather than causing a fatal error do to an unhandled exception.

My fix has been merged upstream. I'll prepare a cherry-pick to backport it to our Monolog version in mediawiki/vendor in advance of a new upstream release.

Change 254081 had a related patch set uploaded (by BryanDavis):
Monolog: Backport Attempt to recover from json encoding errors

https://gerrit.wikimedia.org/r/254081

Change 254081 merged by jenkins-bot:
Monolog: Backport Attempt to recover from json encoding errors

https://gerrit.wikimedia.org/r/254081

bd808 closed this task as Resolved.Nov 19 2015, 4:07 PM

The reencoding fix made upstream in Monolog will reach wikis with 1.27.0-wmf.8. We should keep the WhatFailureGroupHandler fix in our configuration as well to protect against any future regressions disrupting the request-response cycle.

bd808 moved this task from Done to Archive on the User-bd808 board.Dec 10 2015, 5:09 PM