As a sub-task of T120171, this task discusses steps towards storing current revisions only, in a reliable, low-maintenance, and low-latency manner.
## Option: Retention policies using application-level TTLs {icon star spin color=blue}
This approach uses a schema identical to that of the current storage model, one that utilizes so-called wide rows to model a one-to-many relationship between a title and its revisions, and a one-to-many relationship between each revision and its corresponding renders. It differs only in how it approaches retention.
Since renders are keyed on a type-1 UUID, retaining a single current render, and (at least) 24 hours worth of past renders, is as simple as batching a range delete with new renders, using a `tid` predicate 24 hours less than the one being inserted.
Limiting renders is slightly more challenging since the revision is an integer and no temporal context exists. As a result, additional storage is used to establish this relationship, mapping timestamps to corresponding revisions. Records in this timeline are keyed by domain (on the assumption that mediawiki sharding would never be more granular than this). Updates to the timeline can be performed probabilistically, if necessary. TTLs can be applied to prevent unbounded growth.
See https://www.mediawiki.org/wiki/RESTBase/StorageDesign#Retention_policies_using_application-level_TTLs for a more thorough explanation.
## Option: Table-per-query
This approach materializes views of results using distinct tables, each corresponding to a query.
### Queries / Tables
- The most current render of the most current revision (table: `current`)
- The most current render of a specific revision (table: `by_rev`)
- A specific render of a specific revision (table: `by_tid`)
### Algorithm
Data in the `current` table must be durable, but the contents of `by_rev` and `by_tid` can be ephemeral (should be, to prevent unbounded growth), lasting only for a time-to-live after the corresponding value in `current` has been superseded by something more recent. There are two ways of accomplishing this, either by a) copying the values on a read from `current`, or b) copying them on update, prior to replacing a value in `current`. Neither of these strategies are ideal.
For example, with non-VE use-cases, copy-on-read is problematic due to the write-amplification it creates (think: HTML dumps). Additionally, in order to fulfill the VE contract, the copy //must// be done in-line to ensure the values are there for the forthcoming save, introducing additional transaction complexity, and latency. Copy-on-update over-commits by default, copying from `current` for every new render, regardless of the probability it will be edited, but happens asynchronously without impacting user requests, and can be done reliably. This proposal uses the //copy-on-update// approach.
See https://www.mediawiki.org/wiki/RESTBase/StorageDesign#Table-per-query for details.
____
## See also
- {T156209}