The idea is to make mainstash (currently x2) similar to PC: T373037: Make ParserCache more like a ring
Currently, for mainstash we have x2 which is one primary plus two idle replicas per datacenter. It is an extremely painful part of the infrastructure from DBAs perspective. So much that currently when we need to do reboots, we just reboot the servers live since that causes less user-facing errors than doing a switchover.
The proposal is to introduce three sections: ms1, ms2, ms3 each having a primary db only (per dc). MediaWiki would split the writes at random but it would write to 2 out of three. In case we need to do maintenance (reboots or hardware issues), we depool one section and let the keys to failover to the other sections. When doing the reads, MediaWiki would also read from two hosts as well and if one is missing or has lower exptime, it gets ignored. That prevents full loss of data in case we depool a section.
There will be still failure scenarios such as when TTL changes or set to indef or when we have to depool two sections, etc. but all seem to be well under error budget.