This task is to test and document a way to achieve rack level HA.
Note that in codfw we don't really have the hosts in different racks, so we would not be able to test a real case, but we can experiment.
Note that C8 and D5 are together in codfw due to having only 3 hosts xd (in eqiad,
= Options
== Setting it manually, all at once
Reference: https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/
* [] dump current crushmap:
```
ceph osd getcrushmap -o crushmap.bin
```
* [] Decompile into text
```
crushtool -d crushmap.bin -o crushmap.txt
```
* [] Make a copy
```
cp crushmap.txt crushmap.$(date +%Y%m%d%H%M%S).before_rack_ha.txt
```
* [] Edit the crushmap
** [] Get the lowest ID (they are negative numbers) for a device in the file (In our case it was -8)
** [] Add the new entries for the new racks (F4, E4, D5 and C8), decrementing the IDs for each `id` entry, example with fake ids:
```
rack E4 {
id -9
id -10 class ssd
alg straw2
hash 0 # rjenkins1
}
rack F4 {
id -11
id -12 class ssd
alg straw2
hash 0 # rjenkins1
}
rack C8 {
id -13
id -14 class ssd
alg straw2
hash 0 # rjenkins1
}
rack D5 {
id -15
id -16 class ssd
alg straw2
hash 0 # rjenkins1
}
```
** [] Add each of the hosts to the corresponding racks, and remove from the root default bucket:
```
rack C8 {
id -13
id -14 class ssd
alg straw2
hash 0 # rjenkins1
item cloudcephosd2001-dev weight 1.746 # example from codfw
}
```
** [] Now comes a tricky part, sum up all the weights for all the hosts on each bucket, and add the buckets to the default one:
```
root default {
id -1 # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
# weight 5.238
alg straw2
hash 0 # rjenkins1
item E4 weight 1.746 # this is the sum of the weights of all the hosts in the E4 rack bucket
item F4 weight 1.746
item C8 weight 1.746
item D5 weight 1.746
}
```
** [] Edit the rules to use the rack instead of the host for the `chooseleaf` step:
```
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack # <- here, this was host before
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type rack # <- here too
step emit
}
```
** [] Double check all the above steps
** [] Compile a new crush map:
```
crushtool -c crushmap.txt -o crushmap.bin
```
** [] Test that the rules still work well (check that there's no misplaced pgs, that is shows 1024/1024, and that there's more or less the same placements on each device, current output example P44923):
```
crushtool --test -i crushmap.bin --show-utilization --num-rep=3
```
** [] Load the new crush map and wait for the cluster to shift data around (will take a looong time)
```
ceph osd setcrushmap -i crushmap.bin
```