Currently, when this script outputs status about raw webrequest partitions, it prints and X for partitions that are not 100% good. Even if there is only 1 duplicate, an X will be shown. This makes it difficult to reason about the severity of data loss of duplicates in this report.
2015-03-12T00/1H || . | . | X | . | . |
I'd like to have something more useful here. Maybe max(abs(percent_different)) if any percent_different field is non zero. Qchris says that to do this, we'd need to edit the dump_dataset_raw_webrequest_partition method in the script.