Our data model for HTML content does not distinguish between low-latency high-volume access to current revisions and long-term archival. This leaves some room for optimization for each of those two use cases.
Compression ratios for HTML content are currently at around 15% of the input size. Since the changes between revisions are actually small, ratios in the low single-digit percent ought to be possible. The main issue preventing this for HTML is currently the deflate window size of 32k not picking up repetitions between revisions of articles larger than 32k, which are relatively common. We could add brotli support to Cassandra to get large windows, but would then need to use extremely large input block sizes to pick up a decent number of repetitions. This in turn would likely make reads slower and more memory intense.
A better option to reduce storage needs might be to chunk content, ideally in alignment with semantic blocks like top-level sections in HTML. If these chunks are smaller than 32k, then deflate will pick up repetitions between chunks. Additionally, most edits only affect a single chunk in a large document. We could skip adding new versions of unchanged chunks altogether, which also reduces the write load. Finally, the first chunk should normally load more quickly than an entire document, reducing the time to first byte.
Another consideration is a separation of hot from cold storage, so that we can replicate hot data to the edge, but keep cold archival data only in two DCs and possibly on more density-optimized hardware. We can do this relatively easily by storing current revisions in a key-value bucket in addition to archival storage.
There is also room for optimization for high-volume access to current revisions. A good option for high-volume access to current revisions might be to store each revisions as an individually gzip compressed blob, ready to be streamed to the client without any recompression. The main gains in this scheme should come from the lack of extra reads and computation in Cassandra, as well as avoiding the need to compress data on the way out.