Page MenuHomePhabricator

Logging level of cassandra should be warning or error but not debug
Closed, ResolvedPublic13 Estimated Story Points

Event Timeline

Setting of INFO logging level: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/daily/workflow.xml#L254

But it might not be this cause definitely we have DEBUG messages showing up

Also, oozie is logging at info level, this is probably what we want:
[nuriaruiz@nurieta][/workplace/operations/puppet]$ more ./modules/cdh/templates/oozie/oozie-log4j.properties.erb | grep log4j.logger.org.apache.oozie
log4j.logger.org.apache.oozie=INFO, oozie

soo.. changing

<property>
                  <name>mapreduce.map.log.level</name>
                  <value>INFO</value>
              </property>
              <property>
                  <name>mapreduce.reduce.log.level</name>
                  <value>INFO</value>
              </property>

did not do anything

Also tried to add -Dyarn.app.mapreduce.am.log.level=ERROR to the command that runs the job, that did not work either

To be clear, is not that mapreduce.reduce.log.level : ERROR is not taking effect, it is cause i see it listed on task logs

./attempt_1571142484661_60235_m_000000_0.txt: mapreduce.reduce.log.level : ERROR

Issue is that the setting does not seem to have any effect in the logging settings of the javacode that does the actual cassandra login (ReducerToCassandra.java)

Maybe, maybe

On the job that is executing the ReducerToCasandra the bindings choosen for slf4j are:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [file:/var/lib/hadoop/data/g/yarn/local/filecache/37046/refinery-cassandra-0.0.35.jar/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]

But the original job (that triggers the cassandra one):

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [file:/var/lib/hadoop/data/i/yarn/local/filecache/40898/refinery-cassandra-0.0.35.jar/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Given: http://www.slf4j.org/codes.html#multiple_bindings the choosing of bindings when many are defined on your dependency path is random.

Ok, tried gazillion exclusion schemes in pom with no sucess

What is happening here: cassandra loading job is using ch.qos.logback.classic.util.ContextSelectorStaticBinder as logger, we do not have a specific config for this one logger (see http://logback.qos.ch/manual/configuration.html) and then is login DEBUG.

Now, excluding logback from cassandra page via mvn excludes does not work cause we get an exception with other loggers, removing all loggers from cassandra pom.xml breaks as well, so WIP

elukey renamed this task from logging level of cassandra should be warning or error but not debug to Logging level of cassandra should be warning or error but not debug.Nov 19 2019, 2:18 PM
elukey triaged this task as High priority.
elukey subscribed.

Raising to High since this was the cause of T238648 (mediarequests backfilling creating huge log files on HDFS).

I see, will work on this some more today.

Change 551889 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] Comenting debug logging

https://gerrit.wikimedia.org/r/551889

Nuria set the point value for this task to 5.Nov 19 2019, 6:25 PM
Nuria moved this task from Paused to Ready to Deploy on the Analytics-Kanban board.

Change 551889 merged by jenkins-bot:
[analytics/refinery/source@master] Comenting debug logging

https://gerrit.wikimedia.org/r/551889

Change 551893 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update cassandra jar in related jobs for logging

https://gerrit.wikimedia.org/r/551893

I'm glad I double checked. Unfortunately the above 2 patches will probably not be enough. I've done a quick analysis on logs from a mediarequest backfilling job (~100Gb logs):

val df = spark.read.text("/var/log/hadoop-yarn/apps/analytics/logs/application_1573208467349_68436")

df.
  selectExpr("(substr(value, 15) like 'main] DEBUG o.w.a.r.cassandra.ReducerToCassandra - Writing new result for line%') as removed_line", "octet_length(value) as size").
  groupBy("removed_line").
  agg(count(lit(1L)).as("count"), sum("size").as("size")).
  show(1000, false)

+------------+---------+-----------+                                            
|removed_line|count    |size       |
+------------+---------+-----------+
|true        |88514207 |16763811434|
|false       |590467017|91221912694|
+------------+---------+-----------+

Removing the debug-line in cassandra-reducer will only reduce log-lines by 13% in number, 15.5% in size :(

Change 552067 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Fix cassandra logging

https://gerrit.wikimedia.org/r/552067

Change 552067 merged by Nuria:
[analytics/refinery/source@master] Fix cassandra logging

https://gerrit.wikimedia.org/r/552067

Nuria changed the point value for this task from 5 to 13.

Change 551893 merged by Mforns:
[analytics/refinery@master] Update cassandra jar in related jobs for logging

https://gerrit.wikimedia.org/r/551893