Page MenuHomePhabricator

toolforge: prometheus-node-exporter not working on tools-proxy-06
Closed, ResolvedPublic


It seems tools-prometheus can't fetch metrics from tools-proxy-06. There seems to be some network limitation, but I'm not sure where it is.

aborrero@tools-prometheus-01:~$ curl http://tools-proxy-06:9100/metrics
aborrero@tools-prometheus-01:~$ curl http://tools-proxy-05:9100/metrics | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  2  187k    2  3960    0     0  12627      0  0:00:15 --:--:--  0:00:15 12611# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.3399e-05
go_gc_duration_seconds{quantile="0.25"} 5.6367e-05
go_gc_duration_seconds{quantile="0.5"} 8.812e-05
go_gc_duration_seconds{quantile="0.75"} 0.000190065
go_gc_duration_seconds{quantile="1"} 0.005104519
go_gc_duration_seconds_sum 176.261069009
go_gc_duration_seconds_count 394292
# HELP go_goroutines Number of goroutines that currently exist.
curl: (23) Failed writing body (136 != 16384)

Both servers tools-proxy-05 and tools-proxy-06 have exact same ferm configuration and same openstack security groups. The major difference is that tools-proxy-05 has a floating IP while the other don't.

Network packets arrive at the server and prometheus-node-exporter is listening:

aborrero@tools-proxy-06:~$ sudo tcpdump -n -i any port 9100
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
12:35:41.026752 IP > Flags [S], seq 3720217858, win 29200, options [mss 1460,nop,nop,TS val 736369904 ecr 0,nop,wscale 9], length 0
12:35:42.050322 IP > Flags [S], seq 3720217858, win 29200, options [mss 1460,nop,nop,TS val 736370160 ecr 0,nop,wscale 9], length 0
2 packets captured
2 packets received by filter
0 packets dropped by kernel
aborrero@tools-proxy-06:~$ sudo ss -putanl | grep 9100
tcp     LISTEN   0        1024                   *:9100                 *:*      users:(("prometheus-node",pid=18131,fd=3))                       

But nobody accepts those tcp/9100 packets. No errors are reported for prometheus-node-exporter.service.

Event Timeline

aborrero triaged this task as Medium priority.Nov 12 2019, 12:38 PM
aborrero created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

Mentioned in SAL (#wikimedia-cloud) [2019-11-12T12:52:38Z] <arturo> reboot tools-proxy-06 to reset iptables setup T238058

aborrero claimed this task.

That was it. The server wasn't rebooted after the initial puppet run in which we set up iptables alternatives, leading to a mix of iptables-nft and iptables-legacy rules.