Eventually we'd like to store all article revisions as HTML. This task tries to establish a rough guess for the storage we'd need for this.
- After ~1 1/2 months of operation, all HTML revisions rendered across all wikipedias take up about 6T of space in the cluster. This does not include non-article pages that were not edited or otherwise re-rendered in the last two months. The size of articles only was about 1.5T (all three-way replicated).
- On enwiki, the ratio of pages to articles in ns0 is ~36 million to ~5 million.
- In the large wikis, the mean number of revisions is around 20 revisions / article ([enwiki: 21](https://en.wikipedia.org/wiki/Special:Statistics), dewiki: 28, [frwiki: 17](https://fr.wikipedia.org/wiki/Sp%C3%A9cial:Statistiques)). Smaller wikis tend to have less edits per article.
Assuming a mean of 20 revisions per page and three-way replication, this means that:
- we'd need at least ~20 * 1.5T = 30T of storage for articles (ns0) only, and
- we'd need at least 20 * 6T = **~120T** of storage for all pages.
The assumption of 20 revisions / page and uniform page size is probably conservative, so there's a non-zero chance that these numbers could actually work out in real life.