Details
Related Objects
Event Timeline
Setting of INFO logging level: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/daily/workflow.xml#L254
But it might not be this cause definitely we have DEBUG messages showing up
Also, oozie is logging at info level, this is probably what we want:
[nuriaruiz@nurieta][/workplace/operations/puppet]$ more ./modules/cdh/templates/oozie/oozie-log4j.properties.erb | grep log4j.logger.org.apache.oozie
log4j.logger.org.apache.oozie=INFO, oozie
soo.. changing
<property> <name>mapreduce.map.log.level</name> <value>INFO</value> </property> <property> <name>mapreduce.reduce.log.level</name> <value>INFO</value> </property>
did not do anything
Also tried to add -Dyarn.app.mapreduce.am.log.level=ERROR to the command that runs the job, that did not work either
To be clear, is not that mapreduce.reduce.log.level : ERROR is not taking effect, it is cause i see it listed on task logs
./attempt_1571142484661_60235_m_000000_0.txt: mapreduce.reduce.log.level : ERROR
Issue is that the setting does not seem to have any effect in the logging settings of the javacode that does the actual cassandra login (ReducerToCassandra.java)
Maybe, maybe
On the job that is executing the ReducerToCasandra the bindings choosen for slf4j are:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [file:/var/lib/hadoop/data/g/yarn/local/filecache/37046/refinery-cassandra-0.0.35.jar/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
But the original job (that triggers the cassandra one):
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [file:/var/lib/hadoop/data/i/yarn/local/filecache/40898/refinery-cassandra-0.0.35.jar/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Given: http://www.slf4j.org/codes.html#multiple_bindings the choosing of bindings when many are defined on your dependency path is random.
What is happening here: cassandra loading job is using ch.qos.logback.classic.util.ContextSelectorStaticBinder as logger, we do not have a specific config for this one logger (see http://logback.qos.ch/manual/configuration.html) and then is login DEBUG.
Now, excluding logback from cassandra page via mvn excludes does not work cause we get an exception with other loggers, removing all loggers from cassandra pom.xml breaks as well, so WIP
Raising to High since this was the cause of T238648 (mediarequests backfilling creating huge log files on HDFS).
Change 551889 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] Comenting debug logging
Change 551889 merged by jenkins-bot:
[analytics/refinery/source@master] Comenting debug logging
Change 551893 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update cassandra jar in related jobs for logging
I'm glad I double checked. Unfortunately the above 2 patches will probably not be enough. I've done a quick analysis on logs from a mediarequest backfilling job (~100Gb logs):
val df = spark.read.text("/var/log/hadoop-yarn/apps/analytics/logs/application_1573208467349_68436") df. selectExpr("(substr(value, 15) like 'main] DEBUG o.w.a.r.cassandra.ReducerToCassandra - Writing new result for line%') as removed_line", "octet_length(value) as size"). groupBy("removed_line"). agg(count(lit(1L)).as("count"), sum("size").as("size")). show(1000, false) +------------+---------+-----------+ |removed_line|count |size | +------------+---------+-----------+ |true |88514207 |16763811434| |false |590467017|91221912694| +------------+---------+-----------+
Removing the debug-line in cassandra-reducer will only reduce log-lines by 13% in number, 15.5% in size :(
Change 552067 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Fix cassandra logging
Change 552067 merged by Nuria:
[analytics/refinery/source@master] Fix cassandra logging
Change 551893 merged by Mforns:
[analytics/refinery@master] Update cassandra jar in related jobs for logging