We have a need to track several bits of data per *title*, independent of revisions corresponding to this title:
- Page deletions and other protection information. A page corresponding to a given title can be deleted many times, which a versioned `page` table can capture.
- The title's rename history (so that we can reconstruct linear histories), possibly in the form of `renamed_from` and `renamed_to` fields.
- The MediaWiki page_id and other bits from [the page table](https://github.com/wikimedia/mediawiki/blob/master/maintenance/tables.sql#L223-L279). Note that a single page title can map to multiple page ids over time if the content was deleted & re-created or restored, or renamed to & from other titles. Again, a versioned `page` table can capture this with one row per version.
- Efficient and (ideally) ordered listings of all titles (T89564). The order requirement is actually not so trivial to support for large data sets without more secondary index work, so maybe worth deferring?
- Possibly, at a point in the (far) future, a synchronization point for atomic moves and renames.
We should start to support this properly in RESTBase. We should probably expose this information at [/page/title/{title}](http://rest.wikimedia.org/en.wikipedia.org/v1/?doc#!/Page_content/page_title__title__get). This will also give us a natural resource path for page-related events like page creation or deletion.
Logically we can then check whether a page is deleted on each revision access. This would bring the number of queries per revision request to three for old revisions (one additional check for revision deletion), and two for new revisions. In local testing, an extra revision metadata request currently roughly halves throughput, so there is a big advantage in retrieving all protection information in a single request. We might be able to avoid the extra page metadata request by also storing page deletion information in a [static column](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refStaticCol.html) in each key_rev_value bucket. The static column is shared between all revisions of a given domain & title, so only needs to be updated once on page deletion, for each content type. Static columns can be added & removed with a schema upgrade. In order to keep old revisions of a page hidden while allowing new content creation at the same title, we might need to store a timestamp or timeuuid indicating the time before which content should be suppressed. If we also denormalized revision deletion information per-row, then we could reliably load all suppression information in a single I/O.
## Strawman design (WIP)
Page table schema (yaml syntax):
```
attributes:
title: string
renamed: string
# less sure about these..
restrictions: set<string> # deleted
page_id: int
# more attributes
tid: timeuuid
index:
- type: hash
attribute: title
- type: range
attribute: tid
order: desc
```
Notes:
- one row per change with tid; *all* changes (including deletions or renames) create new rows with new tids
- renamed can be null, or a string "to:<title>", "from:<title>"; alternatively, we could consider separate rows or a map; a rename will insert two rows at {source,destination} with the same tid
- restrictions / deleted likely isn't quite right yet; should go through restriction & query use cases to figure that out:
- page is deleted: all revisions with this title should be hidden
- Page is not current: Queries asking for the 'latest' revision using this title shouldn't return anything, but old revisions can still be retrieved (up until the next `deleted` entry in the page table). This use case might warrant a static deleted_before tid column that is updated on each deletion / undeletion, so that typical checks only require one query.