Page MenuHomePhabricator

Design notes for scalable and cost-effective revision archival storage
Closed, DeclinedPublic

Description

As part of T120171, we are moving towards separating current revision storage from those of old revisions. This task is exploring requirements and design options for old revision storage.

Requirements

  • Reasonable, predictable access latency.
  • Sufficient read throughput to sustain HTML dump and research use cases.
  • Reliable, low-maintenance, and cost-effective storage system.

Design considerations

  • When using Cassandra, avoid wide row and tombstone issues described in T144431.
  • Revisions are extremely repetitive. We investigated efficient compression algorithms in T122028, and found that we can compress HTML revisions of an article down to 2-3% of its raw size when laying out multiple revisions in a single, multi-megabyte compression block.
  • Efficient listings are optional. Access by revision is the norm.

Update patterns

To simplify the implementation & improve performance, we would like to avoid any deletes & overwrites. To this end, one of these update strategies can be employed:

  1. Store only one (the first) render per revision. In the case of the cur / old split, this means that a new render would only be stored if the current table did not return any match.
  2. Store only one render per <time unit>. Checks the timestamp of the last entry for a title in the old table, and saves a new render if that timestamp is more than <time unit> in the past.

Options

Compression blocks and metadata table

Outline: Store up to <n> revisions for a key in a single compression block (Cassandra row) to achieve a high compression ratio. Limit compression block size to avoid wide rows in the content table.

  • Add a metadata table along these lines: key (hash) | rev | tid | count (static) | block, with count being a static integer counter, and block being an integer identifying the compression block the actual content is stored under.
  • The storage table is structured as (key | block) | rev | tid | (content). (key | block) is a composite hash key.

Write procedure:

  • Read static per-key counter from metadata table. Calculate target block number with <count> div <blocksize> (config var). Store actual content in storage table, using the target block number. Add a metadata entry for the revision, recording the block number, and increment the static per-title counter (same query).

Read procedure:

  • Read block number for key / rev / tid from metadata table.
  • Read storage entry for key, block, rev, tid, content from content table.

Potential issues

  • The metadata table will still have wide rows. However, the table is very skinny, and there are no overwrites or deletions. If we would like to avoid this, we could instead create two metadata tables:
    • One table holding a single counter per key.
    • A second table holding the mapping from (key, rev) | tid to block id. With (key, rev) being a composite hash key, rows are limited in width to the number of renders we keep per revision. Even if we store one render per day for a page that is *never* edited, we would end up with only about 3650 entries in ten years. Considering the narrow metadata table schema, this seems unlikely to cause problems.