Page MenuHomePhabricator

Labstore1006/7 profile for meltdown kernel
Closed, ResolvedPublic

Description

Previously we had an issue noted in T169290: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS where upgrading the labstore1004/5 servers to 4.9.25-1~bpo8+3 caused a significant performance issue. We have come back around in the update cycle and this time we have labstore1006/7 which are Jessie hosts also running NFSd. They do not have DRBD on labstore1006/7 so it will not be a perfect test, but since ganeti in production is using DRBD with the 4.9.0-0.bpo.5-amd64 we have some data there. Note it looks like DRBD on ganeti1001 is version: 8.4.7 (api:1/proto:86-101) where labstore1005 is version: 8.4.5 (api:1/proto:86-101). We may want to look at upgrading DRBD in the labstore case. But as it stands profiling performance on labstore1006/7 needs to happen anyway so we decided to do so with an eye towards understanding the impact on labstore1004/5.

  • Install current labstore 1004/5 kernel on labstore1006/7
  • Profile performance under load for NFSd on labstore1006/7 ro (normal use case)
  • Profile performance under load for NFSd with an adhoc rw NFSd share (labstore1004/5 replicant use case)
  • 3x for all datapoints
  • Install Meltdown candidate kernel on labstore1006/7
  • Profile performance under load for NFSd on labstore1006/7 ro (normal use case) -- same as before
  • Profile performance under load for NFSd with an adhoc rw NFSd share (labstore1004/5 replicant use case) -- same as before
  • 3x all datapoints
  • reflect on next steps

Event Timeline

chasemp created this task.

Reporting back here on what I found

So I ran some dd based tests to get baseline numbers

  1. dd syncronous test with file larger than client RAM write

Ran: dd bs=64G count=100 if=/dev/zero of=/mnt/testmount/testfile oflag=sync

Results:

  • 4.4 kernel (raw disk, average over 3 runs) - 911 MB/s
  • 4.9 kernel (raw disk, average over 3 runs) - 803 MB/s
  • 4.4 kernel (nfsd, average over 3 runs) - 108 MB/s
  • 4.9 kernel (nfsd, average over 3 runs) - 130 MB/s
  1. dd syncronous o_direct test with file larger than client RAM write

Ran: dd bs=64G count=100 if=/dev/zero of=/mnt/testmount/testfile oflag=sync,direct

Results:

  • 4.4 kernel (raw disk, average over 3 runs) - 921 MB/s
  • 4.9 kernel (raw disk, average over 3 runs) - 826 MB/s
  • 4.4 kernel (nfsd, average over 3 runs) - 115 MB/s
  • 4.9 kernel (nfsd, average over 3 runs) - 119MB/s
  1. dd syncronous test with file larger than client RAM read

Ran: dd bs=64G count=100 of=/dev/zero if=/mnt/testmount/testfile iflag=sync

Results:

  • 4.4 kernel (raw disk, average over 3 runs) - 1.13 GB/s
  • 4.9 kernel (raw disk, average over 3 runs) - 1.1 GB/s
  • 4.4 kernel (nfsd, average over 3 runs) - 182 MB/s
  • 4.9 kernel (nfsd, average over 3 runs) - 161 MB/s
  1. dd syncronous o_direct test with file larger than client RAM read

Ran: dd bs=64G count=100 of=/dev/zero if=/mnt/testmount/testfile iflag=sync,direct

Results:

  • 4.4 kernel (raw disk, average over 3 runs) - 1.46 GB/s
  • 4.9 kernel (raw disk, average over 3 runs) - 1.43 GB/s
  • 4.4 kernel (nfsd, average over 3 runs) - 171 MB/s
  • 4.9 kernel (nfsd, average over 3 runs) - 123 MB/s

I also ran various tests using fio across the 2 kernels over NFSd - https://tools.wmflabs.org/labstore-profiling/

From comparing the 2 kernels for NFSd and raw disk performance, I can see that there's a small loss in performance on both reads and writes in the new Spectre kernel. Looking at the load graphs from the fio tests, there's no significant difference in how the kernels perform under heavy load. These patterns don't seem similar to what we saw when we upgraded labstore1004 & 5 to 4.9 kernels, and my suspicion is nfs isn't the issue there.