Page MenuHomePhabricator

[ceph] Tune performance for DB VMs
Closed, DeclinedPublic

Description

We want to be able to run DB workloads on ceph, but it seems that the current performance is not good enough.

Specially latency.

Here are some ideas that could help to improve the latency:

  • Set the cpu governor to performance on both clients and ceph nodes:
# show the current governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# to set it-(using linuk-cpu)
cpupower frequency-set -g performance

# or directly
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
...
  • Disable the C6 idle state of the cpu (133ms to swap out of it)
dcaro@cloudcephosd2001-dev:~$ cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 4
Available idle states: POLL C1 C1E C6
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 271510
Duration: 26342578
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 86052232
Duration: 23540449392
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 175529836
Duration: 30833353708
C6:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 493391927
Duration: 7329855739002
  • Investigate different filesysems on top of rbd:
    • ext4
    • btrfs
    • xfs
  • Continue adding stuff

Event Timeline

dcaro triaged this task as High priority.Oct 27 2022, 3:48 PM
dcaro created this task.

Turns out that all the ceph hosts have the governor set on performance already, so that change would apply only to cloudvirts.

Ceph is now handling ToolsDB, without any change in the Ceph configuration. The main turning point was setting innodb_flush_log_at_trx_commit=2 in MariaDB, more details in T301949.