When configured with the `NetworkTopologyStrategy` (as the AQS cluster is), Cassandra will distribute replicas across different points of failure. Cassandra uses the nomenclature //"rack"//, but at the WMF we treat rows in our datacenters as the unit of failure for replica placement. The below table shows current placement according to the configuration, in addition to where nodes are actually located in the datacenter (eqiad only, at the moment).
| host | cassandra "rack" | datacenter row |
| ---- | ---- | ---- |
| aqs1010 | rack1 | a |
| aqs1013 | rack1 | c |
| aqs1011 | rack2 | b |
| aqs1014 | rack2 | d |
| aqs1012 | rack3 | c |
| aqs1015 | rack3 | d |
We make heavy use of a replica count of 3, and `QUORUM` consistency for both reads & writes. With replicas properly distributed over 3 or more rows, we are able to survive an entire row outage without any disruption to the service(s). The placement above is incorrect though because there are scenarios where a single row failure //will// result in outages; In our configuration, a failure of either row C or D will make a significant number of the replicas drop below a quorum, and create outage(s).
Fixing this will mean decommissioning, physically relocating, and bootstrapping servers back into the cluster. Given this situation isn't new, it probably makes sense to wait until we've deployed the servers for the new expansion (T304173). We should however take into account the moves needed here, when we're determining row placement for the new servers (my understanding is that row space is constrained in places).
----
### Proposed
| host | cassandra "rack" | datacenter row | target row |
| ---- | ---- | ---- | ---- |
| aqs1010 | rack1 | a | |
| aqs1013 | rack1 | c | |
| aqs1011 | rack2 | b | |
| aqs1014 | rack2 | d | |
| aqs1012 | rack3 | c | |
| aqs1015 | rack3 | d | |
| aqs1016 | | | |
| aqs1019 | | | |
| aqs1017 | | | |
| aqs1020 | | | |
| aqs1018 | | | |
| aqs1021 | | | |