Page MenuHomePhabricator

Monitor resource usage on a per-cgroup basis
Closed, ResolvedPublic

Description

We currently monitor and plot CPU, memory and I/O usage on a per-host basis.

In order to monitor resource usage on a service-basis, we should enable cgroups accounting in systemd, either globally or on specific units.

DefaultCPUAccounting, DefaultBlockIOAccounting and DefaultMemoryAccounting can be used to enable cgroups accounting system-wide, see systemd-system.conf(5). Similar settings (CPUAccounting and friends) can be used to enable the feature on selected units. See systemd.resource-control(5).

cadvisor can be used to export all available metrics to prometheus, using id as the label to distinguish between cgroups.

container_cpu_system_seconds_total{id="/system.slice/rsyslog.service"} 0.04
[...]
container_memory_rss{id="/system.slice/ssh.service"} 40960

Event Timeline

ema created this task.Dec 18 2017, 2:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 18 2017, 2:30 PM
ema triaged this task as Medium priority.Dec 18 2017, 2:30 PM
fgiunchedi moved this task from Inbox to Backlog on the observability board.Dec 10 2019, 2:19 PM
ema raised the priority of this task from Medium to High.Jan 8 2020, 4:45 PM
ema added a subscriber: MoritzMuehlenhoff.
CDanis added a subscriber: CDanis.Jan 8 2020, 4:47 PM

Change 565006 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] systemd: allow setting global accounting options

https://gerrit.wikimedia.org/r/565006

Change 565019 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable systemd resources accounting on cp4026

https://gerrit.wikimedia.org/r/565019

Change 565032 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: add systemd config to profile::cache::base

https://gerrit.wikimedia.org/r/565032

Change 565006 merged by Ema:
[operations/puppet@production] systemd: allow setting global accounting options

https://gerrit.wikimedia.org/r/565006

Change 565032 merged by Ema:
[operations/puppet@production] cache: add systemd config to profile::cache::base

https://gerrit.wikimedia.org/r/565032

Change 565019 merged by Ema:
[operations/puppet@production] cache: enable systemd resources accounting on cp4026

https://gerrit.wikimedia.org/r/565019

Change 565236 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable systemd resources accounting on cp4027

https://gerrit.wikimedia.org/r/565236

Change 565236 merged by Ema:
[operations/puppet@production] cache: enable systemd resources accounting on cp4027

https://gerrit.wikimedia.org/r/565236

ema added a comment.Jan 16 2020, 9:58 AM

I have enabled cpu, memory, and blockio cgroups accounting on cp4026 Jan 16 09:19:59 and cp4027 Jan 16 09:37:00.

We can now observe if the change has any impact when it comes to resource utilization. So far there's no difference, see for example cp4027.

It's going to be interesting to observe impact on busier nodes (esams).

Change 572669 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable cgroup accounting on two esams nodes

https://gerrit.wikimedia.org/r/572669

ema added a comment.Feb 17 2020, 2:09 PM

Cadvisor is not in Buster, see the package tracker. I have tried building it on boron: it turns out that the package depends on containerd, also not in buster, which depends on several golang libraries in turn not available. The binary package currently in testing and unstable (0.35.0+ds1-4) does however work on buster out of the box, I have verified that on cp4027.

Having discussed the matter with @MoritzMuehlenhoff and @fgiunchedi on IRC we came to the conclusion that, although far from optimal, the course of action to take now is simply importing the binary into buster-wikimedia.

Mentioned in SAL (#wikimedia-operations) [2020-02-17T14:17:21Z] <ema> reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb T183146

Change 572669 merged by Ema:
[operations/puppet@production] cache: enable cgroup accounting on two esams nodes

https://gerrit.wikimedia.org/r/572669

Change 572682 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add module to configure cadvisor

https://gerrit.wikimedia.org/r/572682

Change 572693 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: add cadvisor exporter

https://gerrit.wikimedia.org/r/572693

Change 573267 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: enable cgroup accounting

https://gerrit.wikimedia.org/r/573267

Change 572682 merged by Ema:
[operations/puppet@production] prometheus: add cadvisor_exporter module and profile

https://gerrit.wikimedia.org/r/572682

Change 573267 merged by Ema:
[operations/puppet@production] cache: enable cgroup accounting

https://gerrit.wikimedia.org/r/573267

Change 573272 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: add cadvisor jobs

https://gerrit.wikimedia.org/r/573272

Change 572693 merged by Ema:
[operations/puppet@production] cache: add cadvisor exporter

https://gerrit.wikimedia.org/r/572693

Change 573272 merged by Ema:
[operations/puppet@production] prometheus: add cadvisor jobs

https://gerrit.wikimedia.org/r/573272

Change 584553 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] systemd: add support for network accounting

https://gerrit.wikimedia.org/r/584553

Change 584553 merged by Ema:
[operations/puppet@production] systemd: add support for network accounting

https://gerrit.wikimedia.org/r/584553