In T273474, I found that the result column header was disappearing, and a random logline concatenated to the end. See this output,
Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 7 ms. row count = 3899 Feb 25, 2021 12:18:48 PM WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but i s org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: Record report_date wiki edit_count_bucket template_dialog_add_known_param_success template_dialog_add_unknown_param_success template_dialog_add_known_param_abort template_dialog_add_unknown_param_abort 2021-02-23 arwiki 100-999 edits 576 0 0 0 2021-02-23 arwiki 1000+ edits 1280 0 0 0 [....] 2021-02-23 zh_yuewiki 1000+ edits 16 0 0 0 2021-02-23 zhwiki 1000+ edits 16 0 0 0 Reader initialized will read a total of 3898 records. Feb 25, 2021 12:18:48 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
This seems to be a collision between log buffering and script output, and should probably be solved by replacing the grep -v parquet.hadoop clause by explicitly silencing loggers.