In the Iceberg Working Session, we briefly discussed that we could potentially benefit from moving away from SNAPPY and into a more modern analytics compression algorithm.
In this task, we should:
- Pick one of our big tables (>= 1 TB) that we think would benefit
- Backfill Iceberg tables with different compression algorithms, such as SNAPPY, GZIP, and ZSTD.
- Run a couple of WMF real world queries on each.
- Compare and contrast SELECT benefits against compression time and compression ratio.
- Decide if we should move out