In order to increase performance, it seems that the kernels noop io scheduler might be better suited than the default mq-deadline, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
ceph.osd: Allow setting the io scheduler of the osd disks | operations/puppet | production | +131 -50 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | dcaro | T273649 Improve ceph performance | |||
Resolved | dcaro | T273791 [ceph] Change the io scheduler none/noop |
Event Timeline
Did a test on one of the osds:
root@cloudcephosd1006:~# for i in {1..10}; do ceph tell osd.47 bench >> bench.mq-deadline; done root@cloudcephosd1006:~# ceph daemon osd.47 list_devices { "device": "/dev/sdj" } root@cloudcephosd1006:~# cat /sys/block/sdj/queue/scheduler [mq-deadline] none root@cloudcephosd1006:~# echo "none"> /sys/block/sdj/queue/scheduler root@cloudcephosd1006:~# for i in {1..10}; do ceph tell osd.47 bench >> bench.none; done root@cloudcephosd1006:~# cat bench.mq-deadline |phaste https://phabricator.wikimedia.org/P14215 root@cloudcephosd1006:~# cat bench.none |phaste https://phabricator.wikimedia.org/P14216
Also manually checked the output of iostat -x 1 sdj for 60 seconds, and with
the mq-deadline scheduler got several peaks of usage:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sdj 18.00 189.00 108.00 12612.00 0.00 151.00 0.00 44.41 49.56 49.11 13.81 6.00 66.73 3.57 74.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.92 0.00 0.98 11.01 0.00 87.08 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sdj 52.00 345.00 272.00 13064.00 1.00 103.00 1.89 22.99 117.63 72.83 18.53 5.23 37.87 2.52 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 1.00 0.00 1.23 4.90 0.00 92.87 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sdj 242.00 451.00 8824.00 6664.00 55.00 12.00 18.52 2.59 11.13 11.93 5.87 36.46 14.78 1.06 73.20
While with none scheduler there were no peaks at all.
It's worth noticing though that this is an osd that's being used, so this is
just circunstancial data.
Next step will be to use the none scheduler on all osds (through puppet).
Change 662689 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] ceph.osd: Allow setting the io scheduler of the osd disks
Change 662689 merged by David Caro:
[operations/puppet@production] ceph.osd: Allow setting the io scheduler of the osd disks
Mentioned in SAL (#wikimedia-cloud) [2021-02-09T11:14:37Z] <dcaro> Merged the osd scheduler change for all osds, applying on all cloudcephosd* (T273791)