Page MenuHomePhabricator

bothersome output in hive when querying events database
Closed, ResolvedPublic3 Estimated Story Points

Description

A lot of output in hive when querying events database, how can we make it disappear?

Can't load log handler "java.util.logging.FileHandler"
java.io.FileNotFoundException: /tmp/hive-parquet-logs/parquet-0.log (Permission denied)
java.io.FileNotFoundException: /tmp/hive-parquet-logs/parquet-0.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at java.util.logging.FileHandler.open(FileHandler.java:228)
at java.util.logging.FileHandler.rotate(FileHandler.java:680)
at java.util.logging.FileHandler.openFiles(FileHandler.java:557)
at java.util.logging.FileHandler.<init>(FileHandler.java:281)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruc

Nov 1, 2018 9:35:31 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 5161 records.
Nov 1, 2018 9:35:31 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Nov 1, 2018 9:35:31 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 21 ms. row count = 5161
Nov 1, 2018 9:35:31 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

Note: this output is printed when query does not return any results

Event Timeline

@Nuria: This is related to the fix we deployed to try to prevent the logging lines (https://gerrit.wikimedia.org/r/c/operations/puppet/cdh/+/469499)
Can you precise which machine you were using when you got those logs?

The only reason I can think of for this issue to happen would be that you and someone else have a hive query writing parquet-logs at the same time.
I'd love to be able to provide a better naming for the log-files (embedding username at least), but java-logging configuration doesn't allow that easily by default.
Maybe there are ways to do differently?

fdans triaged this task as High priority.
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans added a project: Analytics-Kanban.

Change 471928 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet/cdh@master] Update hive parquet log to HiverServer2 only

https://gerrit.wikimedia.org/r/471928

Change 471928 merged by Elukey:
[operations/puppet/cdh@master] Update hive parquet log destination

https://gerrit.wikimedia.org/r/471928

Nuria set the point value for this task to 3.Nov 12 2018, 4:00 PM

This worked great and bogus output is no longer there.