In T273474, I found that the result column header was disappearing, and a random logline concatenated to the end. See this output,
Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 7 ms. row count = 3899
Feb 25, 2021 12:18:48 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but i
s org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: Record
report_date wiki edit_count_bucket template_dialog_add_known_param_success template_dialog_add_unknown_param_success template_dialog_add_known_param_abort
template_dialog_add_unknown_param_abort
2021-02-23 arwiki 100-999 edits 576 0 0 0
2021-02-23 arwiki 1000+ edits 1280 0 0 0
[....]
2021-02-23 zh_yuewiki 1000+ edits 16 0 0 0
2021-02-23 zhwiki 1000+ edits 16 0 0 0
Reader initialized will read a total of 3898 records.
Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next blockThis seems to be a collision between log buffering and script output, and should probably be solved by replacing the grep -v parquet.hadoop clause by explicitly silencing loggers.