This is the task for the schema change project documented on the wiki at User:Brion VIBBER/Compacting the revision table round 2. Part of the description is copied below.
Per ongoing discussion in ArchCom and at WikiDev17 about performance, future requirements, and future-proofing for table size it's proposed to do a major overhaul of the revision table, combining the following improvements:
- Normalization of frequently duplicated data to separate tables, reducing the dupe strings to integer keys
- Separation of content-specific from general-revision metadata to support:
- Multi-content revisions allowing for storing of multiple content blobs per revision -- not related to compaction, but of great interest for structured data additions planned for multimedia and articles
- general reduction in revision table width / on-disk size will make schema changes easier in future
- trying to avoid inconsistencies in live index deployments
- ideally all indexes should fit on all servers, making it easier to switch database backend around in production
Wikimedia production tasks:
- T166733: Deploy refactored comment storage
- T188327: Deploy refactored actor storage
- T215466: Remove revision_comment_temp and revision_actor_temp
Also we should take the opportunity to clean up the mismatch between tables.sql and the DBs with respect to the type of the rev_timestamp column mentioned in P8433.