Page MenuHomePhabricator

[ceph] test how disabling the disk cache affects the io throughtput
Closed, ResolvedPublic

Description

This is a simple test that colud have a big impact on the performance of the cluster. It's worth spending some time to check it out.

To test:

  • Take one osd out
  • Do fio tests on the disk
  • Disable write cache
  • Do fio tests on the disk
  • Reprovision that osd (the disk and process, not the host)
  • Add again to the cluster

Event Timeline

dcaro triaged this task as High priority.Jan 7 2021, 12:30 PM
dcaro created this task.

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T12:53:58Z] <dcaro> Taking osd.0 down on codfw ceph cluster to try the disk performance testing process (T271417)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T14:39:51Z] <dcaro> Starting speed tests on cloudcephosd2001-dev sdc (T271417)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T15:19:39Z] <dcaro> Finished speed tests on cloudcephosd2001-dev, reprovisioning the osd.0 sdc (T271417)

How'd it go? I had thought we tested this before, but we probably didn't 😁

Finished the tests on codfw cluster, it's now re-shuffling data back, this is the process + results:

Chose osd.0, hosted on cloudceph2001-dev

Taking the osd down

ll /var/lib/ceph/osd/ceph-0
..
lrwxrwxrwx 1 ceph ceph   93 Dec  1 18:02 block -> /dev/ceph-38e63119-ff99-490e-a97a-e7525bca9c53/osd-block-ed4d301a-cfac-484a-8442-62809093265b
...
# gives you the device it's using
pvscan
...
PV /dev/sdc   VG ceph-38e63119-ff99-490e-a97a-e7525bca9c53   lvm2 [894.00 GiB / 0    free]
...
# gives you the device -> hard drive
ceph osd out osd.0  # makes the data move out of the osd
ceph osd down osd.0  # not sure is needed given the out, but marks the osd as 'down' so the replica will take over traffic
--- wait for the data to re-shuffle
service ceph-osd@0 stop
umount /var/lib/ceph/osd/ceph-0

Tests

Script used: https://phabricator.wikimedia.org/P13663

root@cloudcephosd2001-dev:~# hdparm -W /dev/sdc
/dev/sdc:
 write-caching =  0 (off)
root@cloudcephosd2001-dev:~# ./speedtest.sh | tee without_cache.txt

Result: https://phabricator.wikimedia.org/P13662

root@cloudcephosd2001-dev:~# hdparm -W 1 /dev/sdc
/dev/sdc:
 setting drive write-caching to 1 (on)
 write-caching =  1 (on)
root@cloudcephosd2001-dev:~# ./speedtest.sh | tee with_cache.txt

Result: https://phabricator.wikimedia.org/P13664

Bringing the osd back

ceph osd destroy osd.0
# workaround https://tracker.ceph.com/issues/24793
dmsetup ls

# search for the same id the device had, in this case 38e63119...
# ex: ceph--38e63119--ff99--490e--a97a--e7525bca9c53-osd--block--e24f80d5--4ab2--48e1--98ce--df28cca36836
dmsetup remove ceph--38e63119--ff99--490e--a97a--e7525bca9c53-osd--block--e24f80d5--4ab2--48e1--98ce--df28cca36836
ceph-volume lvm zap /dev/sdc

ceph-volume lvm create --osd-id 0 --data /dev/sdc
# this was enough, just wait for the cluster to re-shift data
watch ceph status

Side by side results: https://phabricator.wikimedia.org/P13665

with cache | without cache
Linear read:                         |ios=31491/0   |ios=29949/0
Linear write:                        |ios=47/31020  |ios=16/30960
Peak parallel random read:           |ios=3245618/0 |ios=5779926/0
Single-threaded read latency:        |ios=472836/0  |ios=1232638/0
Peak parallel random write:          |ios=43/5023780|ios=43/5040033
Journal write latency (with sync):   |ios=43/1183974|ios=43/2262624
Journal write latency (with fsync):  |ios=43/2215944|ios=43/2157667
Single-threaded random write latency:|ios=43/1176272|ios=43/2269595
NOTE: got it wrong the first time, as I depended on the order that diff --side-by-side with* chose the files xd
This comment was removed by dcaro.

Mentioned in SAL (#wikimedia-cloud) [2021-01-08T09:41:04Z] <dcaro> Taking osd.48 from eqiad ceph cluster out to do performance tests (T271417)

Mentioned in SAL (#wikimedia-cloud) [2021-01-08T09:59:55Z] <dcaro> Started performance tests on sdc (od.48) for eqiad ceph cluster (T271417)

Mentioned in SAL (#wikimedia-cloud) [2021-01-08T10:40:03Z] <dcaro> Finished tests, brining osd online (od.48) for eqiad ceph cluster (T271417)

For cloudcephosd1001 (eqiad), with a different hardware, the benefit is not as high, but still a big increase for write latency:

dcaro@cloudcephosd1001:~$ sudo smartctl -a /dev/sdc | grep Model
Device Model:     SSDSC2KG019T8R

# 1st run
Linear read:                         |ios=117655/0  |ios=120206/0
Linear write:                        |ios=16/97639  |ios=43/97561
Peak parallel random read:           |ios=4170185/0 |ios=3298122/0
Single-threaded read latency:        |ios=578001/0  |ios=565411/0
Peak parallel random write:          |ios=99/3693180|ios=128/3718437
Journal write latency (with sync):   |ios=99/1183600|ios=128/1616904
Journal write latency (with fsync):  |ios=43/936917 |ios=166/1565817
Single-threaded random write latency:|ios=43/1137671|ios=43/1573119

# 2nd run
Linear read:                         |ios=120038/0  |ios=119443/0
Linear write:                        |ios=43/97574  |ios=43/97749
Peak parallel random read:           |ios=3236441/0 |ios=3403247/0
Single-threaded read latency:        |ios=564938/0  |ios=581872/0
Peak parallel random write:          |ios=99/3690768|ios=99/3712488
Journal write latency (with sync):   |ios=99/1276329|ios=99/1621712
Journal write latency (with fsync):  |ios=43/956141 |ios=43/1578310
Single-threaded random write latency:|ios=43/1251877|ios=43/1596920

Raw tests output:
with_cache_disabled.2.txt
DONE Created a new paste.

with_cache_disabled.txt
DONE Created a new paste.

with_cache_enabled.2.txt
DONE Created a new paste.

with_cache_enabled.txt
DONE Created a new paste.

I think this is enough tests try to disable the cache on all ceph osds :)
I'll open a new task for it (I'll try to puppetize)