Page MenuHomePhabricator

node-exporter collector.diskstats.ignored-devices underescaped
Closed, ResolvedPublic

Description

Noticed this while investigating something else. Supposedly we instruct prometheus-node-exporter to not export disk statistics for partitions with -collector.diskstats.ignored-devices=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$ unfortunately though this string is underescaped, and the resulting argv passed to node-exporter doesn't have \ and thus the regular expression is essentially a noop as seen below. The solution is to use \\\\ and thus get \ into argv.

deploy1001:~$ curl -s localhost:9100/metrics | grep -i node_disk_writes_completed
# HELP node_disk_writes_completed The total number of writes completed successfully.
# TYPE node_disk_writes_completed counter
node_disk_writes_completed{device="dm-0"} 2.4013669e+07
node_disk_writes_completed{device="md0"} 4.9097061e+07
node_disk_writes_completed{device="md1"} 2.4173871e+07
node_disk_writes_completed{device="sda"} 3.1098514e+07
node_disk_writes_completed{device="sda1"} 2.6160813e+07
node_disk_writes_completed{device="sda2"} 4.937701e+06
node_disk_writes_completed{device="sdb"} 3.1098956e+07
node_disk_writes_completed{device="sdb1"} 2.6161031e+07
node_disk_writes_completed{device="sdb2"} 4.937925e+06
deploy1001:~$ ps fwwwaux | grep -i ignored-device
filippo  16852  0.0  0.0  12784   996 pts/1    S+   15:57   0:00              \_ grep -i ignored-device
prometh+ 20522  0.2  0.0 1665956 19084 ?       Ssl   2018 141:12 /usr/bin/prometheus-node-exporter -collector.diskstats.ignored-devices=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvmed+nd+p)d+$ -collector.filesystem.ignored-fs-types=^(overlay|autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ -collector.filesystem.ignored-mount-points=^/(sys|proc|dev|var/lib/docker|var/lib/kubelet)($|/) -collector.textfile.directory=/var/lib/prometheus/node.d -collectors.enabled=buddyinfo,conntrack,diskstats,edac,entropy,filefd,filesystem,hwmon,loadavg,mdadm,meminfo,netdev,netstat,sockstat,stat,tcpstat,textfile,time,uname,vmstat -web.listen-address=:9100

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 24 2019, 4:02 PM
colewhite added a subscriber: colewhite.EditedJan 24 2019, 5:28 PM

@fgiunchedi found that systemd is the likely culprit: https://github.com/systemd/systemd/issues/10659 https://github.com/systemd/systemd/pull/11427

The fix appears to be in systemd 241.

In the mean time, one option might be to replace \d with [0-9] and circumvent slashes altogether.

Change 486192 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus: upgrade to node-exporter 0.17 in backports

https://gerrit.wikimedia.org/r/486192

fgiunchedi moved this task from Backlog to Up next on the observability board.Feb 8 2019, 11:36 AM
colewhite triaged this task as Normal priority.Mar 4 2019, 4:10 PM
colewhite claimed this task.

Change 494262 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] prometheus: change escaped to character classes to work around systemd bug

https://gerrit.wikimedia.org/r/494262

Change 494262 merged by Cwhite:
[operations/puppet@production] prometheus: change escaped to character classes to work around systemd bug

https://gerrit.wikimedia.org/r/494262

colewhite closed this task as Resolved.Mar 6 2019, 6:34 PM