Page MenuHomePhabricator

[ceph] Disable write cache on all osds
Closed, ResolvedPublic


On task T271417 we found that it would help decrease significantly the write latency if we disable the write cache on the disks (original source

This task is to puppetize the disabling of the caches on all osd devices.

How to manually check:

hdparm -W <path_to_device>

ex: hdparm -W /dev/sdc
 write-caching =  0 (off)

How to disable:

hdparm -W 0 <path_to_device>

ex: dcaro@cloudcephosd1001:~$ sudo hdparm -W 0 /dev/sdc
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)

Event Timeline

dcaro triaged this task as High priority.Jan 8 2021, 11:00 AM
dcaro created this task.

@dcaro did your latest research show that we don't actually want to do this? If so we can close it!

It showed that we would benefit from it, there's a huge (2x) gain on journal write latency, and some gain (1.25x) on random writes, I'll work on the puppet patch next, got delayed by writing the first tests for the openstack module on the other change I have.

Change 655923 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] wmcs.ceph.osd: disable write caches when possible

Change 655923 merged by David Caro:
[operations/puppet@production] wmcs.ceph.osd: disable write caches when possible

Mentioned in SAL (#wikimedia-cloud) [2021-01-15T08:19:52Z] <dcaro> Merging the patch to disable write caches on ceph osds (T271527)

Change 656371 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] wmcs.ceph.osd: actually disable write caches

Change 656371 merged by David Caro:
[operations/puppet@production] wmcs.ceph.osd: actually disable write caches

This, though the disk gives better speed when having random writes, does not seem to have much effect cluster-wise.

Just did some tests on codfw1, enabling and disabling the cache on all the osd nodes and running the command:

# for conf in 4M:16:write 4k:1:randwrite 4k:128:randwrite; do for i in 1 2 3; do bs="${conf%%:*}"; rw="${conf##*:}"; io="${conf#*:}"; io="${io%:*}"; echo "bs=$bs || rw=$rw || io=$io"; fio -ioengine=rbd -direct=1 -name=test -bs=$bs -iodepth=$io -rw=$rw -pool=codfw1dev-compute -runtime=60 -rbdname=fio_test 2>&1 | tee$bs.iodepth$io.$rw.cache_disabled.$i; done ; done

(got the comands from

And the results are:

  • P13801 rbd.speed_test.bs4k.iodepth128.randwrite.cache_disabled.1
  • P13802 rbd.speed_test.bs4k.iodepth128.randwrite.cache_disabled.2
  • P13803 rbd.speed_test.bs4k.iodepth128.randwrite.cache_disabled.3
  • P13804 rbd.speed_test.bs4k.iodepth128.randwrite.cache_enabled.1
  • P13805 rbd.speed_test.bs4k.iodepth128.randwrite.cache_enabled.2
  • P13806 rbd.speed_test.bs4k.iodepth128.randwrite.cache_enabled.3
  • P13807 rbd.speed_test.bs4k.iodepth1.randwrite.cache_disabled.1
  • P13808 rbd.speed_test.bs4k.iodepth1.randwrite.cache_disabled.2
  • P13809 rbd.speed_test.bs4k.iodepth1.randwrite.cache_disabled.3
  • P13810 rbd.speed_test.bs4k.iodepth1.randwrite.cache_enabled.1
  • P13811 rbd.speed_test.bs4k.iodepth1.randwrite.cache_enabled.2
  • P13812 rbd.speed_test.bs4k.iodepth1.randwrite.cache_enabled.3
  • P13813 rbd.speed_test.bs4M.iodepth16.write.cache_disabled.1
  • P13814 rbd.speed_test.bs4M.iodepth16.write.cache_disabled.2
  • P13815 rbd.speed_test.bs4M.iodepth16.write.cache_disabled.3
  • P13816 rbd.speed_test.bs4M.iodepth16.write.cache_enabled.1
  • P13817 rbd.speed_test.bs4M.iodepth16.write.cache_enabled.2
  • P13818 rbd.speed_test.bs4M.iodepth16.write.cache_enabled.3

That does not give any noticeable change from one setup to the other, maybe I'm not doing the right tests the right way, but so far that does not show any noticeable difference.