Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	hashar
	May 21 2019, 10:32 AM

Description

The job mediawiki-core-code-coverage-docker usually takes 2h / 2h30 based on https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/buildTimeTrend

integration-slave-docker-1054 is usually less than two hours. The job times out after four hours on 1055 and 1056 :(

After some investigations below, the slow instances are on cloudvirt1005 / cloudvirt1006 which have Intel Xeon E3-12xx v2 (Ivy Bridge) at 2.7Ghz. The fasted builds are on instances having a Skylake processor at 2.3Ghz.

I can imagine a newer architecture offers improvement, but a for a single thread CPU bound doing simple maths, I would expect the Ivy Bridge Xeon at 2.7Ghz to be faster than the Skylake one at 2.3Ghz.

I tried a couple small CPU benchmarks which I ran on instances:

Compute 10k prime numbers using 64 bit with the sysbench package and running: sysbench --test=cpu run.
Do some basic math with stress-ng and running: stress-ng --cpu 1 --cpu-ops=400000

Results:

Model	Intel Xeon E3-12xx v2 (Ivy Bridge)	Intel Core Processor (Skylake)
cpu MHz	2,7 Ghz	2,3 GHz
bogoMips	5390	4590
sysbench duration	~ 16 seconds	~ 9 seconds
stress-ng	~ 18 seconds	~ 10 seconds

Rearranged, with max turbo:

Host	Model	base speed	Max turbo	bogoMips	Sysbench
cloudvirt1006	Xeon E5-2697 v2	2,700 MHz	3,500 MHz	5,386	15.13s
cloudvirt1025	Xeon Gold 6140	2,300 MHz	3,700 MHz	4,590	9.24s
cobalt	Xeon E5-2623 v3	3,000 MHz	3,500 MHz	6,000	9,30s
contint1001	Xeon E5-2640 v3	2,600 MHz	3,400 MHz	5,200	9.37s
@hashar	i7-8550U	1,800 MHz	4,000 MHz	4,000	7,43s
@hashar #2	i5-4250U	1,300 MHz	2,600 MHz	3,800	11,8s

Note how despite the bogoMips and CPU speed being higher on the Xeon Ivy Bridge, it performs twice slower.

I really dont get why the Intel Xeon is so slow :-/ Maybe it is an oddity due to kvm or a BIOS / hardware configuration issue. One would have to run the same benchmarks on the real servers for comparison?

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		None	T223971 Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)
		Resolved		• JHedden	T225067 labtestvirt2003: test different power management / CPU setups for faster kvm

Event Timeline

hashar created this task.May 21 2019, 10:32 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 21 2019, 10:32 AM

https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/4262/ started at May 21, 2019 3:00:00 AM and took 4 hours

It ran on integration-slave-docker-1056 which is on cloudvirt1005.eqiad.wmnet

Via https://grafana-labs.wikimedia.org/dashboard/db/cloud-vps-project-board , I can see the instance had 12.5% CPU usage for the duration of the job. Which correspond to a single CPU being used at 100% for all the duration of the job. There is no steal CPU indicated.

There is no indication of CPU/io saturation on cloudvirt1005.

https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/4263/ started at May 21, 2019 7:10:43 AM and took 2 hours and a half.

it ran on integration-slave-docker-1043 which is on cloudvirt1022.eqiad.wmnet

Seems all fine as well but it is magically faster :-(

Which leaves me with the CPU that would be slower on 1055 and 1056?

hashar@integration-cumin:~$ sudo cumin --trace --force 'name:docker' 'egrep "(model)" /proc/cpuinfo|sort|uniq'

===== NODE GROUP =====                                                                                                                                                                                               
(8) integration-slave-docker-[1021,1048-1054].integration.eqiad.wmflabs                                                                                                                                              
model           : 94                                                                                                                                                                                                 
model name      : Intel Core Processor (Skylake)

===== NODE GROUP =====                                                                                                                                                                                               
(2) integration-slave-docker-[1055-1056].integration.eqiad.wmflabs                                                                                                                                                   
model           : 58                                                                                                                                                                                                 
model name      : Intel Xeon E3-12xx v2 (Ivy Bridge)

===== NODE GROUP =====                                                                                                                                                                                               
(4) integration-slave-docker-[1034,1040-1041,1043].integration.eqiad.wmflabs                                                                                                                                         
model           : 61                                                                                                                                                                                                 
model name      : Intel Core Processor (Broadwell)

And looking at the build times:

model	model name	time	cloudvirt
94	Intel Core Processor (Skylake)	1h50m	1023 1025 1026 1027 1028 1029
58	Intel Xeon E3-12xx v2 (Ivy Bridge)	4h00 (build times out)	1005 1006
61	Intel Core Processor (Broadwell)	2h30m	1016 1017 1022

A few weeks ago, I have deleted integration-slave-docker-1037 since it was notoriously slow when running the job wmf-quibble-vendor-mysql-hhvm-docker (T222023). But I do not know on which cloudvirt it happened to be scheduled :-(

Looking at https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-hhvm-docker/buildTimeTrend the build takes roughly 13 - 18 minutes. But those scheduled on 1055 or 1056 takes 25-30 minutes.

Mentioned in SAL (#wikimedia-releng) [2019-05-21T11:23:18Z] <hashar> Depooling integration-slave-docker-1055 and integration-slave-docker-1056 : CPU is too slow # T223971

Docs related to hypervisor pinning just in case https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Procedures_and_operations#VM/Hypervisor_pinning

https://grafana.wikimedia.org/d/aJgffPPmz/wmcs-openstack-eqiad1-hypervisor?orgId=1&var-hypervisor=cloudvirt1006&refresh=30s
https://grafana.wikimedia.org/d/aJgffPPmz/wmcs-openstack-eqiad1-hypervisor?orgId=1&var-hypervisor=cloudvirt1005&refresh=30s

They are not even 100% CPU subscribed. 1005/1006 have spinning disks IIRC though. @Andrew can confirm.

I tried a small cpu benchmark using the sysbench package and running: sysbench --test=cpu run. It computes 10k prime numbers using 64 bit integers and stress-ng with 400k operations (stress-ng --cpu 1 --cpu-ops=400000).

Model	Intel Xeon E3-12xx v2 (Ivy Bridge)	Intel Core Processor (Skylake)
cpu MHz	2,7 Ghz	2,3 GHz
bogoMips	5390	4590
sysbench duration	~ 16 seconds	~ 9 seconds
stress-ng	~ 18 seconds	~ 10 seconds

I really dont get why the Intel Xeon is so slow :-/ Maybe it is an oddity due to kvm/qemu. One would have to run the same benchmarks on the real servers instead.

hashar renamed this task from Investigate slow down of mediawiki-core-code-coverage-docker on some Jenkins instances to Old cloudvirt (with Intel Xeon) are twice slower than new ones (Intel Sky Lake).May 21 2019, 12:59 PM

hashar updated the task description. (Show Details)

Today I have noticed that instances scheduled on our oldest cloudvirt machines are most probably slower than expected. They use Intel Xeon E3-12xx v2 (Ivy Bridge) and I suspect there is either a bios option missing or KVM is not finely tuned.

When an instance is scheduled on a recent cloudvirt machine, tasks ends up being twice faster which don't make sense since those CPUs are not twice faster.

Years ago, when I first migrated Jenkins jobs to WMCS, I did noticed the jobs took longer to run by a noticeable margin. I just assumed at the time it was some openstack/kvm/whatever overhead. So potentially the CPU slowdown has been there for ever.

aborrero edited projects, added cloud-services-team (Kanban); removed cloud-services-team.May 21 2019, 1:06 PM

I guess what would help is to run the same benchmark directly on the cloudvirt hosts:

apt -y install sysbench
sysbench --test=cpu run|grep 'Total time'

And report the result of a few runs. Maybe the Intel Xeon is really way slower than a Sky Lake processor but that would really surprise me. Candidate cloudvirt/cpu would be:

Cloud virt	CPU	Instance
cloudvirt1006.eqiad.wmnet	Intel Xeon	integration-slave-docker-1055
cloudvirt1025.eqiad.wmnet	Sky Lake	integration-slave-docker-1054

If running the benchmark command on the host is faster on cloudvirt1006, that means would probably mean we have a configuration issue in kvm/libvirt etc.. Else that would mean the Intel Xeon is slower despite having more bogoMips ?!

I am also interested in the exact specs of the processor. They are not fully exposed to the guest VM.

hashar mentioned this in T222757: quibble-vendor-mysql-hhvm-docker for WikibaseCirrusSearch takes over 40 minutes.May 21 2019, 6:15 PM

Krenair subscribed.May 21 2019, 6:43 PM

hashar moved this task from Untriaged to Externally Blocked on the Continuous-Integration-Infrastructure board.May 21 2019, 9:09 PM

cloudvirt1006.eqiad.wmnet has 2x 12 cores/24 threads, 384G of ram, and 15000rpm spinning disks on an Ivy Bridge (3rd gen) architecture.
cloudvirt1025.eqiad.wmnet has 2x 18 cores/36 threads, 512G of ram, and SSD disks on a Skylake (6th gen) architecture.

@hashar what is your hoped for outcome here? I'm not sure I understand why you are confused that 5 year old servers (cloudvirt1006) are slower than servers purchased in the last year (cloudvirt1025).

I wonder if there should be two separate sets of flavours, one for each type of host. Probably wouldn't want an instance set up on one type migrated to the other. It sounds like right now if you see docs/examples that say a particular flavour should be used (perhaps on the basis of VCPUs), it's useless due to it actually coming down to the luck of what host you get scheduled on?

In T223971#5207016, @Krenair wrote:

I wonder if there should be two separate sets of flavours, one for each type of host. Probably wouldn't want an instance set up on one type migrated to the other. It sounds like right now if you see docs/examples that say a particular flavour should be used (perhaps on the basis of VCPUs), it's useless due to it actually coming down to the luck of what host you get scheduled on?

A while back Andrew added some logic to OpenStack to have newly created instances to be created on the least loaded compute nodes.

OpenStack seems to have some way to partition hosts in aggregate, each aggregate quarying an specific propery (eg: ssd=true). Then a new flavor is created that can only be scheduled on an aggregate carrying that property (eg: m1.large-ssd flavor). https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html#host-aggregates But that is arguably a lot of configuration tweaking and would put more burden on dispatching the VM accross servers.

In T223971#5206979, @bd808 wrote:

cloudvirt1006.eqiad.wmnet has 2x 12 cores/24 threads, 384G of ram, and 15000rpm spinning disks on an Ivy Bridge (3rd gen) architecture.

cloudvirt1025.eqiad.wmnet has 2x 18 cores/36 threads, 512G of ram, and SSD disks on a Skylake (6th gen) architecture.

@hashar what is your hoped for outcome here? I'm not sure I understand why you are confused that 5 year old servers (cloudvirt1006) are slower than servers purchased in the last year (cloudvirt1025).

My issue is that when doing simple maths (purely CPU bound) on a single thread, I would expect the old CPU to be faster since it is at 2.7GHz while the newer one is at 2.3GHz. That is a very lenient approach at the problem since that discounts a lot of how CPU acts nowadays compared to 30 years ago. One sure thing, for a single thread load, I would not expect the older CPU to be twice slower.

The sysbench is very straightforward, it iterates from 3 to 10000 and does:

for(c=3; c < max_prime; c++)  
{
  t = sqrt(c);
  for(l = 2; l <= t; l++)
    if (c % l == 0)
      break;
  if (l > t )
    n++; 
}

T223971#5200409 shows the above code takes 16 seconds on the old CPU versus 9 seconds on a newer one. Hence why I am terribly confused at the pure CPU performance.

If I look up Intel Xeon E5-2697 v2 @ 2.70GHz (old) versus Intel Xeon Gold 6140 @ 2.30GHz (new) in a benchmark database, they end up with roughly the same single thread rating: https://www.cpubenchmark.net/compare/Intel-Xeon-E5-2697-v2-vs-Intel-Xeon-Gold-6140/2009vs3132 (respectively scores of 1732 and 1736).

Thus my expected outcome would be for sysbench --test=cpu run to be roughly similar between the two CPUs. Not a two time difference.

Would it be possible to run the benchmark directly on the servers to rule out kvm/qemu? A oneliner would be:

apt -y install sysbench && sysbench --test-cpu run && apt -y purge sysbench

On cloudvirt1006.eqiad.wmnet and cloudvirt1025.eqiad.wmnet - or if you feel brave on all cloudvirt by using cumin :-]

If the benchmark is way faster on cloudvirt1006.eqiad.wmnet than on an instance it hosts integration-slave-docker-1055, that would indicate a potential issue with kvm/qemu/libvirt etc. Else I would blame some BIOS settings but at that point I will be willing to give up.

Here is the result:

root@cloudvirt1006:~# sysbench --test=cpu run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          15.1276s
    total number of events:              10000
    total time taken by event execution: 15.1261
    per-request statistics:
         min:                                  1.19ms
         avg:                                  1.51ms
         max:                                  8.33ms
         approx.  95 percentile:               2.44ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   15.1261/0.00

root@cloudvirt1025:~# sysbench --test=cpu run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          9.2490s
    total number of events:              10000
    total time taken by event execution: 9.2481
    per-request statistics:
         min:                                  0.78ms
         avg:                                  0.92ms
         max:                                  5.63ms
         approx.  95 percentile:               1.01ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   9.2481/0.00

Note that kernel versions are a bit different:

root@cloudvirt1025:~# uname -a
Linux cloudvirt1025 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
root@cloudvirt1006:/home/aborrero# uname -a
Linux cloudvirt1006 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

Thanks! So the benchmarks on the real hardware looks similar to what I get on the instance. I guess that rules out kvm/qemu/labvirt etc.

hashar updated the task description. (Show Details)May 23 2019, 3:07 PM

Ran it on a couple production host and two of my machines at home:

Host	Model	base speed	Max turbo	bogoMips	Sysbench
cloudvirt1006	Xeon E5-2697 v2	2,700 MHz	3,500 MHz	5,386	15.13s
cloudvirt1025	Xeon Gold 6140	2,300 MHz	3,700 MHz	4,590	9.24s
cobalt	Xeon E5-2623 v3	3,000 MHz	3,500 MHz	6,000	9,30s
contint1001	Xeon E5-2640 v3	2,600 MHz	3,400 MHz	5,200	9.37s
@hashar	i7-8550U	1,800 MHz	4,000 MHz	4,000	7,43s
@hashar #2	i5-4250U	1,300 MHz	2,600 MHz	3,800	11,8s

Eventually I found a MediaWiki application server with a CPU older than the one on cloudvirt1006: mw2139.codfw.wmnet has a Xeon CPU E5-2450 0 based on Sandy Bridge.

Comparison based on:

https://www.cpubenchmark.net/compare/Intel-Xeon-E5-2450-vs-Intel-Xeon-E5-2697-v2/2514vs2009
A busy loop: time $(i=1; while (( i < 2000000 )); do (( i ++ )); done).

Host	mw2139	cloudvirt1006
CPU	`E5-2450`	`E5-2697 v2`
Speed	2.10 GHz	2.70 GHz
Turbo	2.9 GHz	3.5 GHz
Time	10s	17s

On the benchmark page, the single thread rating for mw2139 cpu is 1074 while the one from cloudvirt has 1732 (higher is better).

contint1001 has a more recent Xeon E5-2640 v3 and runs the busy loop in 8.5s

So I am not sure what is happening on the old cloudvirt servers, but given their CPU they should perform better than the old mw2139.codfw.wmnet?

Mentioned in SAL (#wikimedia-releng) [2019-06-03T15:57:44Z] <hashar> Deleting integration-slave-docker-1055 and integration-slave-docker-1056 . CPU is way too slow T223971

I have deleted the affected instances and created two new ones hoping for them to be scheduled on cloudvirt not being cursed with slow CPU. To no luck, I got integration-slave-docker-1058 on cloudvirt1004 and integration-slave-docker-1059 on cloudvirt1005. Both being slow :-\

Could I get them moved to some faster cloudvirt servers please? Seems they would fit on cloudvirt1012 and later ids.

The old cloudvirt are apparently HP ProLiant DL380p Gen8 (https://wikitech.wikimedia.org/wiki/HP_DL380p). HP has some built in power management system and I have found a few report that it might cause the CPU to be slower than expected:

https://v-strange.de/index.php/19-hp-hardware/200-hp-power-management-better-switch-it-off
https://helgeklein.com/blog/2013/05/the-effects-of-power-savings-mode-on-vcpu-performance/

HP doc mentions power management https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c03031625&docLocale=en_US recommending to set it to "high performance".

For gen9 our install doc states to disable the power management:
In bios:

select service options
Set Processor Power Monitoring and choose disabled
Press enter, ignore warning message regarding modification by pressing enter again. Select disabled and press enter again.

https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0_Gen9#Setting_proper_power_option

So very naively, I am wondering if it could just be about tweaking/disabling the HP power management in the BIOS and leave its management to the OS

Mentioned in SAL (#wikimedia-cloud) [2019-06-04T08:56:52Z] <arturo> reallocating integration-slave-docker-1059 and integration-slave-docker-1058 to cloudvirt1012 (T223971)

So on cloudvirt1012 the instances are shown in cpuinfo: Intel Core Processor (Haswell, no TSX). They are now even slower than they used to be on cloudvirt1004 or cloudvirt1005.

cloudvirt1012 is a HP ProLiant DL360 Gen9 which is referred on Wikitech install guide https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0_Gen9#Setting_proper_power_option

In T223971#5232629, @Stashbot wrote:

Mentioned in SAL (#wikimedia-cloud) [2019-06-04T08:56:52Z] <arturo> reallocating integration-slave-docker-1059 and integration-slave-docker-1058 to cloudvirt1012 (T223971)

Sorry I have messed up. I would need those instances on another later cloudvirt since cloudvirt1012 has slow CPUs as well :-\

Maybe it can be reproduced on the test machine labtestvirt2003.codfw.wmnet which is an HP as well, although it is gen 9 (ProLiant DL360 Gen9). It has an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz. Being a test machine, I would guess it is easier to apply different power management settings there to confirm/infirm my theory.

Also raising priority because that prevents us from reinstalling instances :\

Mentioned in SAL (#wikimedia-cloud) [2019-06-05T08:56:52Z] <arturo> move integration-slave-docker-1059 and integration-slave-docker-1058 to cloudvirt1028 (T223971)

hashar mentioned this in T225067: labtestvirt2003: test different power management / CPU setups for faster kvm.Jun 6 2019, 7:27 AM

Could this difference be caused by prevention of those design flaws called Spectre & Meltdown?

• JHedden closed subtask T225067: labtestvirt2003: test different power management / CPU setups for faster kvm as Resolved.Jul 1 2019, 4:00 PM

Result of a testing bios settings on labtestvirt2003.codfw.wmnet which is a ProLiant DL360 Gen9 with an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz. Running:

time $(i=1; while (( i < 2000000 )); do (( i ++ )); done)
sysbench --test=cpu run

In T225067#5263617, @JHedden wrote:

Results of different power regulator settings on labtestvirt2003.codfw.wmnet.

regulator time test sysbench

baremetal min 16.41 24.22

kvm min 16.42 24.29

baremetal max 6.73 10.00

kvm max 6.80 10.01

baremetal *dynamic 6.80 10.14

kvm *dynamic 6.87 10.17

( * dynamic is the default value.)

The high performance max profile is only slightly better than the default dynamic.

	regulator	time test	sysbench
baremetal	min	16.41	24.22
kvm	min	16.42	24.29
baremetal	max	6.73	10.00
kvm	max	6.80	10.01
baremetal	*dynamic	6.80	10.14
kvm	*dynamic	6.87	10.17

There is no meaningful difference between running on baremetal or inside kvm. Changing the bios CPU regulator to minimum does dramatically affect performance though.

I have also noticed T225713: CPU scaling governor audit which is about auditing the CPU governor on our machines and might show up light to this trouble. I would probably write some script to collect metrics over a long run and see whether we can identify a pattern.

And from the list of benchmark, the cloudvirt1006 machine performs way worst than my old/cheap intel nuc at home:

Host	Model	base speed	Max turbo	bogoMips	Sysbench
cloudvirt1006	Xeon E5-2697 v2	2,700 MHz	3,500 MHz	5,386	15.13s
@hashar #2	i5-4250U	1,300 MHz	2,600 MHz	3,800	11,8s

https://www.cpu-monkey.com/en/compare_cpu-intel_xeon_e5_2697_v2-86-vs-intel_core_i5_4250u-3 . Though cloudvirt is on Ivy Bridge and my machine is on Haswell, it is really unclear why it would be THAT slower :-\ Guess we can wait for the outcome of the CPU scaling audit from T225713.

hashar mentioned this in T225713: CPU scaling governor audit.Jul 5 2019, 9:14 AM

fgiunchedi subscribed.Jul 5 2019, 9:26 AM

sysbench is that tool for measuring high load on mysql? I bought my box in May 2009 and sysbench (version 1.0.11) says "total time: 10.0012s" and "execution time (avg/stddev): 9.9984" (2 Quad-Core AMD Opteron(tm) Processor 2382 at 2613 MHz, 5230.44 bogomips)

Seems that I have no need for new hardware …

re: T225713: CPU scaling governor audit what was uncovered is that at least on hp boxes when "power control" is set to anything other than "os control" then pcc-cpufreq is loaded as the scaling driver, however that driver doesn't really scale with >4 cpus. This is currently the case on cloudvirt1006

cloudvirt1006:~$ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver 
pcc-cpufreq
cloudvirt1006:~$ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
ondemand

Although when "os control" is set then intel_pstate scaling driver is loaded and things should be significantly better. In the other task we're not at the "change bios settings" stage yet but we'll get there soon, at any rate let's coordinate on this!

hashar mentioned this in T232646: Move integration-castor03.integration.eqiad.wmflabs to a newer cloudvirt machine.Sep 11 2019, 6:11 PM

hashar mentioned this in T188375: castor rsync's taking 3-5 minutes for mwgate-npm jobs.Sep 11 2019, 6:20 PM

Mentioned in SAL (#wikimedia-releng) [2019-09-25T19:36:27Z] <hashar> Deleting integration-agent-docker-1007 it is too slow ( T223971 )

Mentioned in SAL (#wikimedia-releng) [2019-09-27T20:01:13Z] <hashar> Marking integration-agent-docker-1015 offline due to cloudvirt1004 being wayyyyy too slow T223971

Krenair mentioned this in T243226: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster).Feb 15 2020, 12:34 AM

• Bstorm lowered the priority of this task from High to Medium.Feb 25 2020, 5:25 PM

Reedy renamed this task from Old cloudvirt (with Intel Xeon) are twice slower than new ones (Intel Sky Lake) to Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake).Feb 25 2020, 5:26 PM

This is likely to be fixed with the introduction of a number of things. One is to change to the performance governor, but in testing @JHedden found that in many cases resource contention is a stronger determinant of such an issue. We also are going to be able to do better load balancing soon with ceph.

hashar mentioned this in T249726: operations-puppet-tests-buster-docker times out after 5 minutes.Apr 8 2020, 3:18 PM

In T223971#5916792, @Bstorm wrote:

This is likely to be fixed with the introduction of a number of things. One is to change to the performance governor, but in testing @JHedden found that in many cases resource contention is a stronger determinant of such an issue. We also are going to be able to do better load balancing soon with ceph.

I understand contention can be an issue. Though in this case there are strong indication that the issue is with the underlying hardware and/or T225713

The raw CPU performances are worth than my old Intel NUC at home, even though those old cloudvirt have processor that largely outperform my machine. At least on paper.

hashar mentioned this in T249727: Migrate integration-agent-puppet-docker-1001 to a different cloudvirt machine.Apr 8 2020, 3:26 PM

We're in the process of replacing most of these hosts. Everything is slowed down by COVID but at least we have some orders in.

That most probably comes from the CPU scaling BIOS setting described at T225713 . Then there is no incentive to get this fixed so declining.

Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)Closed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)
Closed, DeclinedPublic
Actions

Related Objects
Search...