## HTML revisions, data-parsoid
Current storage growth is on the order of 60G/day across the cluster.
We currently retain one render per revision, but would like to move to retaining one render per 24 hours in order to keep a history of often-changing templated pages like [[Main Page]] (use case: stable citations). Old revisions are rendered on demand, but we are not systematically traversing them in order to fill the storage. We don't expect to push for storing the full HTML history in this fiscal year.
Assuming no major changes in compression ratios, this means that the growth rate will increase slightly. The current storage will last us slightly beyond the end of this fiscal year, but it would be good to leave some reserve. Assuming a growth rate of 80G/day, we'll need about **29T of storage for the next fiscal year for HTML revisions**.
## Wikitext history
ExternalStore, the MySQL-based system used to store wikitext revisions, is showing its age. We'll eventually need an operationally simpler, more reliable and efficient system. Cassandra / RESTBase can provide wikitext revision storage the same way it does for HTML, with the same advantages around compression, replication, load distribution and fail-over. Furthermore, we can use this to speed up wikitext dumps without affecting the production latency.
For enwiki, all bzip2-compressed wikitext revisions take up about 90G of space. Assuming a ~50% worse compression ratio in Cassandra (likely lzma with smaller blocks) and three-way replication, enwiki will take up around 600G of storage. Extrapolating to all wikis, we should be able to store **all wikitext revisions across all wikis with ~3T of storage**.
## Alternative HTML formats, miscellaneous
The app team is currently developing a service that massages HTML in a mobile-friendly way, and wraps that up with some metadata in a JSON response. For performance, we plan to pre-generate this on edit. For this, we only need to keep around current revisions, which means that we should be able to handle this and other, smaller applications with **~2T of storage**.