Page MenuHomePhabricator

Reduce Cassandra logstash output
Closed, ResolvedPublic

Description

Per @bd808:

15:12 < bd808> urandom: last 24h 1.7M events logged; 99% INFO level; 50% StatusLogger. Just filtering out the StatusLogger messages 
               looks like it could make a pretty big difference (at least in event volume).

StatusLogger are definitely of very limited value from logstash/kibana, and should be filtered to save space.

Event Timeline

I see two alternatives for doing this from the Cassandra-side:

  1. Add a filter to the UDP appender

This will require deploying two new jars (janino-2.7.8, and commons-compiler-2.7.8.jar), and applying the following configuration change:

1--- /home/eevans/logback.xml 2016-03-02 08:02:19.951422003 -0600
2+++ /home/eevans/logback.xml-filtered 2016-03-11 14:40:22.608328641 -0600
3@@ -42,6 +42,13 @@
4 <host>logstash1003.eqiad.wmnet</host>
5 <port>11514</port>
6 <customFields>{"program":"cassandra"}</customFields>
7+ <filter class="ch.qos.logback.core.filter.EvaluatorFilter">
8+ <evaluator>
9+ <expression>return logger.equals("org.apache.cassandra.utils.StatusLogger");</expression>
10+ </evaluator>
11+ <OnMismatch>NEUTRAL</OnMismatch>
12+ <OnMatch>DENY</OnMatch>
13+ </filter>
14 </appender>
15
16 <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">

  1. Set the level for org.apache.cassandra.utils.StatusLogger to something above INFO:

1--- /home/eevans/logback.xml 2016-03-02 08:02:19.951422003 -0600
2+++ /home/eevans/logback.xml-level 2016-03-11 14:41:04.752458032 -0600
3@@ -64,4 +64,5 @@
4 </root>
5
6 <logger name="com.thinkaurelius.thrift" level="ERROR"/>
7+ <logger name="org.apache.cassandra.utils.StatusLogger" level="DEBUG" />
8 </configuration>

The former eliminates the gratuitous StatusLogger messages from the logstash stream, the latter, while a bit simpler removes them everywhere.

I've found these entries to be helpful on one or two occasions, and they are not creating a problem for local logging, so my preference would be for #1. I will test the two solutions though to make sure they work as expected, and don't introduce any obvious performance regressions.

Change 277264 had a related patch set uploaded (by Eevans):
dependencies needed for logstash filtering

https://gerrit.wikimedia.org/r/277264

Change 277265 had a related patch set uploaded (by Eevans):
Filter StatusLogger messages from UDP appender

https://gerrit.wikimedia.org/r/277265

Change 277264 merged by Giuseppe Lavagetto:
dependencies needed for logstash filtering

https://gerrit.wikimedia.org/r/277264

I've queued up https://gerrit.wikimedia.org/r/#/c/277265 for next Tuesdays Puppet SWAT (16:00 UTC).

NOTE: When merged this will go out to all clusters. Cassandra should automagically reload the logback config when the file is changed, but a rolling restart of Cassandra is still recommended.

Change 277265 merged by Muehlenhoff:
Filter StatusLogger messages from UDP appender

https://gerrit.wikimedia.org/r/277265

Stashbot subscribed.

Mentioned in SAL [2016-03-22T16:21:05Z] <urandom> Restarting Cassandra on restbase1007.eqiad.wmnet (canary) : T130393, T128787

Mentioned in SAL [2016-03-22T16:54:55Z] <urandom> Performing rolling restart of AQS Cassandra cluster : T130393, T128787

Mentioned in SAL [2016-03-22T17:03:18Z] <urandom> Rolling restart of AQS Cassandra cluster complete : : T130393, T128787

Mentioned in SAL [2016-03-22T18:39:39Z] <urandom> Rolling restart of Cassandra on maps cluster : T130393, T128787

Mentioned in SAL [2016-03-22T18:41:11Z] <urandom> Aborting restart of Cassandra on maps cluster : T130393, T128787

Mentioned in SAL [2016-03-22T19:53:30Z] <urandom> Rolling restart of RESTBase Cassandra cluster : T130393, T128787

Mentioned in SAL [2016-03-22T20:45:38Z] <urandom> Rolling restart of RESTBase Cassandra cluster complete : T130393, T128787