= The problem =
As discovered by @faidon in {T210723} at least on ms-be HP hosts the scaling governor `ondemand` isn't performing very well, namely load average is high and reported cpu utilization % is also high.
On further research the problem is that default bios settings for power control on HP Gen9 ("dynamic") leads linux to loading `pcc-cpufreq` driver, which doesn't scale with > 4 CPUs and the `ondemand` governor. Using "os control" for power settings lets linux fully control of scaling, `performance` seems to be doing better,the end result being that `intel_pstate` driver is loaded and `powersave` is the default governor. This configuration also decreasing temperature apparently.matches what happens both on Dell and HP Gen10 for the rest of the fleet (see below for a full audit)
In terms of similar issues / precedents, AFAICS we're forcing `performance` governor via `class { '::cpufrequtils': }` (related {T98203}) in puppet for lvs and cache hosts only ATM= The fix =
Issuing `set /system1/oemhp_power1 oemhp_powerreg=os` from ilo ssh on HP Gen9 hosts and rebooting will switch to `intel_pstate` driver + `powersave` governor.
As far as I (Filippo) can tell the governor we want is `powersave` and the driver `intel_pstate`When a reboot is invasive/time consuming (e.g. database hosts) a **temporary fix** is to set the governor to `performance` (setting `powersave` isn't possible, the governors available without a reboot are `ondemand performance schedutil`) and change the ilo settings. On the next reboot then `powersave` will get loaded. While temporary, which is what Dell hosts use out of the box and so do HP Gen10 (not Gen9 though)the fix should get pretty close to a preview on what's going to happen in terms of cpu utilization on next reboot.
= performance vs powersave =
We are forcing some hosts to use `performance` governor via puppet class `cpufrequtils` (e.g. lvs/cp), choosing between performance and powersave for a particular class of hosts is outside the scope of this task though, the goal here is to get the fleet to a standard baseline (i.e. `intel_pstate` + `powersave`).
= Audit =
Fleetwide audit below (Dell + powersave + intel_pstate skipped, since that's the desired/default state already)
== Dell ==
`cumin -b100 'F:virtual ~ physical and F:manufacturer ~ Dell' 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor || true'`
=== ondemand ===
`eeden.wikimedia.org`: old host in esams, unused
`labsdb[1006-1007].eqiad.wmnet`: the `acpi_cpufreq` module has been loaded, I'm guessing depending on bios settings. Hosts are being decom'd in T220144 so we can let them be.
=== No governor ===
```
bast3002.wikimedia.org,cp1008.wikimedia.org,db2114.codfw.wmnet,db1138.eqiad.wmnet,dbproxy2001.codfw.wmnet,dbpro
xy[1001-1011].eqiad.wmnet,dns1002.wikimedia.org,es[2001-2004].codfw.wmnet,helium.eqiad.wmnet,iron.wikimedia.org,labstore[2001-2004].codfw.wmnet,lvs[1001-1006].wikimedia.org,maerlant.wikimedia.org,multatuli.wikimedia.org,nescio.wikimedia.org,rhenium.wikimedia.org,rhodium.eqiad.wmnet,tungsten.eqiad.wmnet
```
perhaps disabled via bios settings, will need to be audited
=== performance ===
```
cp[2001-2002,2004-2008,2010-2014,2016-2020,2022-2026].codfw.wmnet,cp[1075-1090].eqiad.wmnet,cp[5001-5012].eqsin.wmnet,cp[3030,3032-3036,3038-3047,3049].esams.wmnet,cp[4021-4032].ulsfo.wmnet
lvs[1013-1016].eqiad.wmnet,lvs[5001-5003].eqsin.wmnet,lvs[3001-3004].esams.wmnet,lvs[4005-4007].ulsfo.wmnet
```
expected
`analytics1070.eqiad.wmnet,kafka-main[2001-2003].codfw.wmnet,labstore[1004-1005].eqiad.wmnet`
manually set for tests or due to bios settings
== HP ==
`cumin -b100 'F:virtual ~ physical and F:manufacturer ~ HP' 'cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor || true'`
=== powersave
`db[2097-2102].codfw.wmnet,db[1139-1140].eqiad.wmnet` DL360 Gen 10, looks like this generation already works out of the box (i.e. `intel_pstate` is the driver) even when power control is set to `dynamic` in the bios.
`labsdb1012.eqiad.wmnet` ditto as above, host is Gen 10 but DL380 not DL360 (and default settings, i.e. power control is `dynamic`)
`ms-be2037.codfw.wmnet` DL380 Gen9 but fixed bios settings as part of this task to be "os control"
=== No governor
`mc[1022,1031].eqiad.wmnet` likely due to bios settings?
=== performance
`lvs[2001-2006].codfw.wmnet` expected
`ms-be[2016,2031,2033,2034-2035,2038].codfw.wmnet,ms-be1036.eqiad.wmnet` due to tests, will be fixed with bios settings + reboot
=== ondemand
Will need to be fixed via bios settings (i.e. `set /system1/oemhp_power1 oemhp_powerreg=os` from ilo over ssh) and reboot.
If reboot is problematic or requires coordination (e.g. databases) then setting the governor to `performance` via `for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done` will get similar performance (higher) until the next reboot when `powersave` will be used instead.
- [ ] aqs[1004-1006].eqiad.wmnet
- [ ] cloudcontrol2003-dev.wikimedia.org,cloudcontrol[1003-1004].wikimedia.org,cloudweb2001-dev.wikimedia.org
- [ ] clouddb2001-dev.codfw.wmnet
- [ ] cloudnet2002-dev.codfw.wmnet,cloudnet[1003-1004].eqiad.wmnet
- [ ] cloudservices2002-dev.wikimedia.org,cloudservices1003.wikimedia.org
- [ ] cloudvirt[1001-1009,1012-1014,1019-1020].eqiad.wmnet
- [ ] conf[1004-1006].eqiad.wmnet
- [ ] db[2034-2038,2040-2063,2065-2070].codfw.wmnet,db[1074-1095].eqiad.wmnet,dbstore[2001-2002].codfw.wmnet
- [ ] druid[1001-1003].eqiad.wmnet
- [ ] elastic1041.eqiad.wmnet,elastic[1032-1040,1042-1052].eqiad.wmnet,elastic[2025-2036].codfw.wmnet
- [ ] labmon[1001-1002].eqiad.wmnet
- [ ] labpuppetmaster[1001-1002].wikimedia.org
- [ ] labsdb[1009-1011].eqiad.wmnet
- [ ] labstore[1006-1007].wikimedia.org
- [ ] labtestpuppetmaster2001.wikimedia.org,labtestservices2003.wikimedia.org,labtestvirt2003.codfw.wmnet
- [ ] maps2002.codfw.wmnet,maps[1001-1004].eqiad.wmnet,maps[2001,2003-2004].codfw.wmnet
- [ ] mc[1019-1021,1023-1030,1032-1036].eqiad.wmnet,mc[2019-2036].codfw.wmnet
- [ ] ms-be[1016-1035,1037-1039].eqiad.wmnet
- [ ] mwmaint2001.codfw.wmnet
- [ ] netmon2001.wikimedia.org
- [ ] oresrdb2002.codfw.wmnet
- [ ] rdb[2005-2006].codfw.wmnet
- [ ] relforge[1001-1002].eqiad.wmnet
- [ ] restbase2009.codfw.wmnet,restbase[1010-1015].eqiad.wmnet
- [ ] restbase-dev[1004-1006].eqiad.wmnet
- [ ] snapshot[1005-1007].eqiad.wmnet
- [ ] stat1006.eqiad.wmnet
- [ ] wdqs2003.codfw.wmnet,wdqs1003.eqiad.wmnet
- [ ] wezen.codfw.wmnet