Page MenuHomePhabricator

spare restbase servers sending traffic to non listening port
Closed, ResolvedPublic0 Estimated Story Points

Description

I noticed this increase of ICMP dest unreachable on the eqiad:spare cluster:
https://grafana.wikimedia.org/d/000000366/network-performances-global?panelId=20&fullscreen&orgId=1&from=1561686984227&to=1562039140502

Drilling down on it, it seems like this change caused the issue:
https://github.com/wikimedia/puppet/commit/331ded70742ebc36e5c08ec1b129da72d367ff6b#diff-99ec8ea2d6f28c30f9a0d57bb66577e6

A tcpdump shows a lot of loopback traffic:

restbase1007:~$ sudo tcpdump -p icmp -i lo
03:48:38.275443 IP localhost > localhost: ICMP localhost udp port 10514 unreachable, length 399
ayounsi@restbase1007:~$ sudo tcpdump -p -i lo port 10514
03:52:22.255194 IP localhost.40195 > localhost.10514: UDP, length 328

Quick guess to fix it is either stop the process sending the syslog, or make sure the process that receives it doesn't get removed when the serve role is changed to spare.

Event Timeline

ayounsi triaged this task as High priority.Jul 2 2019, 3:56 AM
ayounsi created this task.

Mentioned in SAL (#wikimedia-operations) [2019-07-02T08:10:23Z] <godog> restbase spare hosts, mask and stop restbase - T227054

fgiunchedi claimed this task.

restbase was still running on these hosts, should be fixed now! Resolving