Page MenuHomePhabricator

Benchmark Iceberg tables with SNAPPY vs GZIP vs ZSTD
Open, Needs TriagePublic

Description

In the Iceberg Working Session, we briefly discussed that we could potentially benefit from moving away from SNAPPY and into a more modern analytics compression algorithm.

In this task, we should:

  • Pick one of our big tables (>= 1 TB) that we think would benefit
  • Backfill Iceberg tables with different compression algorithms, such as SNAPPY, GZIP, and ZSTD.
  • Run a couple of WMF real world queries on each.
  • Compare and contrast SELECT benefits against compression time and compression ratio.
  • Decide if we should move out