Record per-server power usage
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	tstarling
	Nov 7 2019, 4:52 AM

Description

The 2019 Wikimedia Foundation Sustainability Assessment states that server electricity usage accounts for 54.6% of Wikimedia's carbon footprint. It would be nice to be able to break that figure down further, for example, by prometheus cluster. This would improve our ability to identify potential efficiency projects.

For Dell servers there is dellhw_exporter, although this requires OMSA to be installed. This is no easy task since there are no recent Debian packages available. However, the same information is apparently available with IPMI:

$ ipmitool -c -I lanplus -H mw1333.mgmt.eqiad.wmnet -U root -E delloem powermonitor powerconsumptionhistory
Power Consumption History

Statistic                   Last Minute     Last Hour     Last Day     Last Week

Average Power Consumption   155 W           155 W         160 W        165 W   
Max Power Consumption       199 W           199 W         220 W        244 W   
Min Power Consumption       117 W           117 W         105 W         97 W   

Max Power Time
Last Minute     : Thu Nov  7 02:05:33 2019
Last Hour       : Thu Nov  7 02:05:33 2019
Last Day        : Wed Nov  6 12:27:57 2019
Last Week       : Tue Nov  5 22:25:05 2019
Min Power Time
Last Minute     : Thu Nov  7 01:36:15 2019
Last Hour       : Thu Nov  7 01:36:15 2019
Last Day        : Wed Nov  6 19:55:55 2019
Last Week       : Sun Nov  3 02:05:35 2019

The idea would be to write a Prometheus plugin which runs this command and parses the response to extract the one minute average power consumption. Resolution is only 1W, but the same resolution is shown in the iDRAC web UI so it is probably the best that is physically available.

For HP ProLiant, there is ilo-exporter, which consumes the iLO RESTful API.

Related Objects

Mentioned Here: T214183: Setup graphs for power usage readings in Grafana

Event Timeline

tstarling created this task.Nov 7 2019, 4:52 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 7 2019, 4:52 AM

tstarling updated the task description. (Show Details)Nov 7 2019, 4:58 AM

Peachey88 subscribed.Nov 7 2019, 4:59 AM

• MZMcBride subscribed.Nov 7 2019, 5:12 AM

I've some concerns to proceed with this. In our experience the BMCs are not that stable and an excessive interaction with them seems to aggravate the situation, statistically causing more BMCs to become unresponsive and requiring a reset.
For this reason we've kept to a minimum our checks of BMCs and I'd rather not add something that query the BMC so often.

I think that for what you're looking for some one shot gathering of data repeated maybe once a month or something like that might be enough. Also take into account that any power consumption data is heavily related to how much "used" the host is overall, making it harder to draw conclusions based only on power consumption and maybe traffic data (e.g. a change in globally installed daemons or different kernels might lead to different data).

FWIW you don't need the remote IPMI for the Dells, you can gather them directly on the host with ipmi-oem, the related available commands are:

get-power-consumption-data
get-instantaneous-power-consumption-data [power_supply_instance]
get-power-head-room
get-power-consumption-statistics <average|max|min>

To be used like:

ipmi-oem Dell get-power-consumption-statistics average

In my experience the get-power-consumption-statistics average is not reliable as the one minute average doesn't change if I stress the host for a minute, while the instantaneous one seems accurate.

AFAIK ipmi-oem doesn't support HP according to ipmi-oem -L, but I didn't look deeper.

There is also T214183: Setup graphs for power usage readings in Grafana for per rack, row, and pdu power stats.

I like the overall idea. Regarding balancing data gathering frequency with accuracy since we have at least a daily cycle in power usage (i.e. matching traffic). I think starting with sampling four times a day should get us representative figures while not being a problem for ilo/idrac. Thoughts ?

In T237604#5647091, @Volans wrote:

I've some concerns to proceed with this. In our experience the BMCs are not that stable and an excessive interaction with them seems to aggravate the situation, statistically causing more BMCs to become unresponsive and requiring a reset.
For this reason we've kept to a minimum our checks of BMCs and I'd rather not add something that query the BMC so often.

Is there any bug report about this? Are you sure it affects the components we would be using? I understand ipmi-oem does not use the network stack.

In my experience the get-power-consumption-statistics average is not reliable as the one minute average doesn't change if I stress the host for a minute, while the instantaneous one seems accurate.

I tested this on scandium and found that the one minute power consumption reported by this method is always the same as the one hour power consumption. So the machine is evidently not collecting a one-minute average and is mislabelling the one-hour average as a one-minute average. I loaded it for 22 minutes. Here is the data collected from the minute/hour averages (blue) versus a model (pink) assuming that it is an hourly average with a step from 70W to 143W at t=0:

It's a bit weird and glitchy, but maybe it is converging on the right answer.

Instantaneous power consumption is noisy, and sampling it once a month would not give you much ability to average over that noise. I would say it's better to collect the daily or weekly average than to use instantaneous power. We could collect both and verify that they converge to the same thing.

In T237604#5707761, @tstarling wrote:

Is there any bug report about this? Are you sure it affects the components we would be using? I understand ipmi-oem does not use the network stack.

@tstarling I don't have any specific URL at hand, sorry, but it's an empirical team knowledge from different past/present experiences. I agree that most of the time the remote ipmi stack was involved, so maybe with the local ipmi-oem we're safer.

I propose that whatever we end up deciding, we set it up only on the canary hosts of some cluster first, and then after a while we expand them if there have been no issue. For ballpark numbers we could even use the data from one host per cluster for those clusters where the load is evenly distributed.

fgiunchedi moved this task from Inbox to Backlog on the observability board.Jul 6 2020, 11:37 AM

lmata edited projects, added SRE Observability; removed observability.Jul 12 2021, 2:22 AM

lmata moved this task from Inbox to Backlog on the SRE Observability board.Jul 15 2021, 4:09 AM

lmata edited projects, added Observability-Metrics; removed SRE Observability.Aug 9 2021, 3:41 AM

Nice to see the progress on this task. We now have the Prometheus IPMI exporter on 585 servers.

fyi i added prometheus-ipmi-exporter to buster hosts as well https://gerrit.wikimedia.org/r/c/operations/puppet/+/824193. i dont think it would be too hard to add to stretch as well but will require a bit more then a straight copy as i get the following. however for now i didn;t think it worth as we are phasing them out.

I'm going to call this done, since about 90% of PDU power usage now appears in server power usage, in both eqiad and codfw.

	F31454655: ipmi-power.png
	Dec 3 2019, 2:39 AM

Record per-server power usageClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Record per-server power usage
Closed, ResolvedPublic
Actions