In order to increase performance, it seems that the kernels noop io scheduler might be better suited than the default mq-deadline, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| ceph.osd: Allow setting the io scheduler of the osd disks | operations/puppet | production | +131 -50 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | dcaro | T273649 Improve ceph performance | |||
| Resolved | dcaro | T273791 [ceph] Change the io scheduler none/noop |
Event Timeline
Did a test on one of the osds:
root@cloudcephosd1006:~# for i in {1..10}; do ceph tell osd.47 bench >> bench.mq-deadline; done
root@cloudcephosd1006:~# ceph daemon osd.47 list_devices
{
"device": "/dev/sdj"
}
root@cloudcephosd1006:~# cat /sys/block/sdj/queue/scheduler
[mq-deadline] none
root@cloudcephosd1006:~# echo "none"> /sys/block/sdj/queue/scheduler
root@cloudcephosd1006:~# for i in {1..10}; do ceph tell osd.47 bench >> bench.none; done
root@cloudcephosd1006:~# cat bench.mq-deadline |phaste
https://phabricator.wikimedia.org/P14215
root@cloudcephosd1006:~# cat bench.none |phaste
https://phabricator.wikimedia.org/P14216Also manually checked the output of iostat -x 1 sdj for 60 seconds, and with
the mq-deadline scheduler got several peaks of usage:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sdj 18.00 189.00 108.00 12612.00 0.00 151.00 0.00 44.41 49.56 49.11 13.81 6.00 66.73 3.57 74.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.92 0.00 0.98 11.01 0.00 87.08
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sdj 52.00 345.00 272.00 13064.00 1.00 103.00 1.89 22.99 117.63 72.83 18.53 5.23 37.87 2.52 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 1.23 4.90 0.00 92.87
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sdj 242.00 451.00 8824.00 6664.00 55.00 12.00 18.52 2.59 11.13 11.93 5.87 36.46 14.78 1.06 73.20While with none scheduler there were no peaks at all.
It's worth noticing though that this is an osd that's being used, so this is
just circunstancial data.
Next step will be to use the none scheduler on all osds (through puppet).
Change 662689 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] ceph.osd: Allow setting the io scheduler of the osd disks
Change 662689 merged by David Caro:
[operations/puppet@production] ceph.osd: Allow setting the io scheduler of the osd disks
Mentioned in SAL (#wikimedia-cloud) [2021-02-09T11:14:37Z] <dcaro> Merged the osd scheduler change for all osds, applying on all cloudcephosd* (T273791)