labtestvirt2003: test different power management / CPU setups for faster kvm
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	hashar
	Jun 5 2019, 9:25 AM

Description

Some cloudvirt machines have very slow CPU for example cloudvirt1004, cloudvirt1005, cloudvirt1012. They have some not so recent CPU but that itself does not really explains why they would be twice, if not three times slower in raw CPU power.

On investigating, that seems to affect HP Proliant machines and @hashar suspects that could be due to HP power management sytem see eg https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c03031625&docLocale=en_US

labtestvirt2003.codfw.wmnet is a test machine and it could be used to measure CPU performance. The tests to conduct would be to run the bash oneliner below directly on the machine, and eventually under KVM. Then check the HP Bios settings for power management, try a different profile and rerun the benchmark?

The server is a ProLiant DL360 Gen9 with Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz.

Some tools @hashar used to run math on 1 thread, the first one should be sufficient

time $(i=1; while (( i < 2000000 )); do (( i ++ )); done)
sysbench --test=cpu run
stress-ng --cpu 1 --cpu-ops=4000

That CPU is found on lot of MediaWiki application server. On mw1307.eqiad.wmnet the shell oneliner takes 7.5 - 8 seconds.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		None	T223971 Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)
		Resolved		• JHedden	T225067 labtestvirt2003: test different power management / CPU setups for faster kvm

Event Timeline

hashar created this task.Jun 5 2019, 9:25 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 5 2019, 9:25 AM

• JHedden claimed this task.Jun 5 2019, 1:52 PM

Some initial baseline data:

host	type	time test avg	stress-ng	sysbench
labtestvirt2003	baremetal	6.76s	26m 4.38s	10.07s
labtestvirt2003	qemu/kvm	6.85s	26m 8.83s	10.14s
cloudvirt1004	baremetal	10.80s	n/a	n/a
cloudvirt1004	qemu/kvm [1]	11.34s	40 mins, 13.80 secs	15.04

(note that labtestvirt2003 is completely idle, while cloudvirt1004 is under a sustained load with ~15 running virtual machines. Since we're not pinning system resources, the additional noise on the hypervisors will effect the test results. )

Once I get access to labtestvirt2003's IPMI/iLO I'll collect some performance metrics for the different power profiles.

hashar updated the task description. (Show Details)Jun 6 2019, 7:08 AM

stress-ng --cpu 1 --cpu-ops=400000 <- turns out that 400k is way too many operations, I am not sure why I have indicated that. Anyway lets skip that command, the others are enough to estimate the raw CPU power.

I have run again the time benchmark on a few hosts. Instances on cloudvirt1005 are no more affected but the ones on cloudvirt1008 / cloudvirt1012 are. So that might "just" be CPU saturation or some contention when too many VMs are running. What puzzles me is that on the parent task T223971, the cloudvirt1005 that was showing slow CPU apparently had low load/cpu usage :-\

Results of different power regulator settings on labtestvirt2003.codfw.wmnet.

	regulator	time test	sysbench
baremetal	min	16.41	24.22
kvm	min	16.42	24.29
baremetal	max	6.73	10.00
kvm	max	6.80	10.01
baremetal	*dynamic	6.80	10.14
kvm	*dynamic	6.87	10.17

( * dynamic is the default value.)

The high performance max profile is only slightly better than the default dynamic.

Closing this task. The default (dynamic) power regulator settings are not impacting the virtual machine performance.

• JHedden closed this task as Resolved.Jul 1 2019, 4:00 PM

Sorry I have lot track of this task and the other. At least we have some raw metric that definitely show the regulator being set to minimum cause the CPU to be wayyy slower. There is another task about auditing the kernel cpu governor which is T225713. That might relate.

Thank you @JHedden !

hashar mentioned this in T223971: Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake).Jul 5 2019, 8:50 AM

hashar mentioned this in T225713: CPU scaling governor audit.Jul 5 2019, 9:17 AM

fgiunchedi subscribed.Jul 5 2019, 9:25 AM

labtestvirt2003: test different power management / CPU setups for faster kvmClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

labtestvirt2003: test different power management / CPU setups for faster kvm
Closed, ResolvedPublic
Actions

Related Objects
Search...