These are the things we need to get high availability at the rack/rack switch level for our current cloud ceph cluster:
- We need to spread the 3 mons in different racks under different switches:
- Current: all the mons are under the B* switch (B2/B4/B7)
- Example of HA: 1 mon under B*, 1 mon under D5 cloudswitch, 1 mon under C8 cloudswitch
- We need to spread the osds in 3 equal-sized groups (if there's any difference on the sizes, that spaces would not be able to be used until the other racks match it or we bring another rack with 2x the difference)
- Current:
- 1 B2
- 1 B4
- 1 B7
- 11 C8
- 10 D5
- Example of HA:
- 7 B*
- 7 D5
- 7 C8
- Current:
Only when the above is done, then we can proceed and configure the cluster (if we do it before the cluster will halt due to lack of high availability).
- We have to configure the osds to report a new bucket (see https://docs.ceph.com/en/latest/rados/operations/crush-map/)
- Current: they only report the host
- Future: we need to report the rack and row, and maybe datacenter
- We have to configure the ceph crush map to account for the rack location:
- Current:
We only have host-level spreading:
root@cloudcephosd1001:~# ceph osd crush rule dump replicated_rule { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }
- Future: We need to add an extra level for the rack, see https://docs.ceph.com/en/latest/rados/operations/crush-map/