Page MenuHomePhabricator

dropped packets to conf1004/5/6 2379/tcp
Closed, ResolvedPublic

Description

From https://logstash.wikimedia.org/goto/1fb05b0016ac422d45087d73b2979ead

It seems like prometheus1003/4.eqiad.wmnet are trying to reach conf1004/5/6 on 2379/tcp, but iptables is dropping the packets.

Not sure if it's expected, but pointing it out in case it's not.

Details

Related Gerrit Patches:

Event Timeline

ayounsi created this task.Nov 20 2019, 8:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2019, 8:34 PM

Indeed looks like prometheus is trying to fetch conf1004.eqiad.wmnet:2379/metrics with no success. Locally on conf1004 even past the firewall the endpoint doesn't seem to work:

root@conf1004:~# curl -v conf1004.eqiad.wmnet:2379/metrics
*   Trying 10.64.0.23...
* TCP_NODELAY set
* Connected to conf1004.eqiad.wmnet (10.64.0.23) port 2379 (#0)
> GET /metrics HTTP/1.1
> Host: conf1004.eqiad.wmnet:2379
> User-Agent: curl/7.52.1
> Accept: */*
> 

* Curl_http_done: called premature == 0
* Connection #0 to host conf1004.eqiad.wmnet left intact
root@conf1004:~#
Joe added a subscriber: Joe.Nov 21 2019, 9:25 AM

@fgiunchedi you need to use https, and it works locally.

If you want to access metrics remotely, either you switch to port 4001 which is publically exposed. We have different ports in codfw and eqiad until we've migrated codfw to etcd3 as well.

@fgiunchedi you need to use https, and it works locally.
If you want to access metrics remotely, either you switch to port 4001 which is publically exposed. We have different ports in codfw and eqiad until we've migrated codfw to etcd3 as well.

Ah! thanks, I'm guessing the prometheus config never got updated to use 4001 instead of 2379

Joe added a comment.Nov 21 2019, 9:54 AM

@fgiunchedi you need to use https, and it works locally.
If you want to access metrics remotely, either you switch to port 4001 which is publically exposed. We have different ports in codfw and eqiad until we've migrated codfw to etcd3 as well.

Ah! thanks, I'm guessing the prometheus config never got updated to use 4001 instead of 2379

yup, my bad. I just tracked it back at when I firewalled off 2379 *after* installing the new cluster.

Change 552765 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: support configcluster and configcluster_stretch

https://gerrit.wikimedia.org/r/552765

Change 552765 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: support configcluster and configcluster_stretch

https://gerrit.wikimedia.org/r/552765

fgiunchedi closed this task as Resolved.Nov 25 2019, 1:45 PM
fgiunchedi claimed this task.
fgiunchedi moved this task from Inbox to In progress on the observability board.

Fixed!