We want to be able to run DB workloads on ceph, but it seems that the current performance is not good enough.
Specially latency.
Here are some ideas that could help to improve the latency:
- Set the cpu governor to performance on both clients and ceph nodes:
# show the current governor cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # to set it-(using linuk-cpu) cpupower frequency-set -g performance # or directly echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ...
- Disable the C6 idle state of the cpu (133ms to swap out of it)
dcaro@cloudcephosd2001-dev:~$ cpupower idle-info CPUidle driver: intel_idle CPUidle governor: menu analyzing CPU 0: Number of idle states: 4 Available idle states: POLL C1 C1E C6 POLL: Flags/Description: CPUIDLE CORE POLL IDLE Latency: 0 Usage: 271510 Duration: 26342578 C1: Flags/Description: MWAIT 0x00 Latency: 2 Usage: 86052232 Duration: 23540449392 C1E: Flags/Description: MWAIT 0x01 Latency: 10 Usage: 175529836 Duration: 30833353708 C6: Flags/Description: MWAIT 0x20 Latency: 133 Usage: 493391927 Duration: 7329855739002
- Investigate different filesysems on top of rbd:
- ext4
- btrfs
- xfs
- Investigate tweaks for the ext4 filesystem / O_DIRECT mariadb setting (see https://www.percona.com/blog/2019/11/12/watch-out-for-disk-i-o-performance-issues-when-running-ext4/)
- Continue adding stuff