Page MenuHomePhabricator

Logstash pipeline crashes on non-UTF8 log messages.
Closed, ResolvedPublic

Description

A non-UTF8-encoded (or mixed encoding) message causes the Logstash main thread to crash. SystemD dutifully restarts Logstash when it happens, but it occurs on a loop as the message is remains at that offset in Kafka when the pipeline starts up again.

Evidenced in:

[ERROR][logstash.pipeline] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. 
{
    "exception"=>"invalid byte sequence in UTF-8",
    "backtrace"=> [
        "org/jruby/RubyString.java:6007:in `rstrip!'",
        "org/jruby/RubyString.java:6094:in `strip!'",
        "org/jruby/RubyString.java:6085:in `strip'",
        "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-mutate-3.3.4/lib/logstash/filters/mutate.rb:496:in `strip'",
        "org/jruby/RubyArray.java:1613:in `each'",
        "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-mutate-3.3.4/lib/logstash/filters/mutate.rb:490:in `strip'", 
        "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-mutate-3.3.4/lib/logstash/filters/mutate.rb:257:in `filter'",
        "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'",
        "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164:in `multi_filter'",
        "org/jruby/RubyArray.java:1613:in `each'",
        "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161:in `multi_filter'",
        "/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:46:in `multi_filter'",
        "(eval):57894:in `initialize'",
        "org/jruby/RubyArray.java:1613:in `each'",
        "(eval):57891:in `initialize'",
        "org/jruby/RubyProc.java:281:in `call'",
        "(eval):58037:in `initialize'",
        "org/jruby/RubyArray.java:1613:in `each'",
        "(eval):58029:in `initialize'",
        "org/jruby/RubyProc.java:281:in `call'",
        "(eval):5886:in `filter_func'",
        "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:398:in `filter_batch'",
        "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:379:in `worker_loop'",
        "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"
    ]
}

@bd808 indicated that the error is probably coming from MediaWiki database logging for wikis where the encoding is not UTF-8 (or UTF-8 compatible).

This task is complete when the pipeline no longer crashes at MediaWiki outages.

Event Timeline

Change 538642 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add mmutf8fix to kafka output actions

https://gerrit.wikimedia.org/r/538642

There are a few options to consider.

herron triaged this task as Medium priority.Sep 24 2019, 2:50 PM
herron subscribed.

IMO mitigation in the logging pipeline is where we should focus, as there are varying applications that could produce malformed messages. Layering safeguards at both the "shipper" and within logstash makes sense to me. Mmutf8fix looks like a good near-term option for the shipper.

One other aspect to think about is splitting pipelines by input/output, so that a logstash pipeline crash wouldn't stop processing of all inputs/outputs. Newer versions provide this via https://www.elastic.co/guide/en/logstash/6.8/multiple-pipelines.html

Change 538642 merged by Cwhite:
[operations/puppet@production] profile: add mmutf8fix to kafka output actions

https://gerrit.wikimedia.org/r/538642

colewhite claimed this task.

We haven't seen this happen in a long while and several potential mitigations have been deployed. We can reopen if it recurs.