Page MenuHomePhabricator

Do we need ping offload servers at all POPs?
Closed, DeclinedPublic

Description

Background

The question came up, in the context of the recent esams rebuild, of whether we were going to deploy a ping offload VM at the rebuilt POP.

Discussion on irc led to a wider question about what our policy for ping offload hosts should be in general. They were introduced under T190090, to reduce load on LVS hosts ensuring most of the ICMP they had to deal with related to actual traffic (rather than spurious echo's coming from the internet).

It seems they were not deployed at POPs originally as we had no Ganeti/VM capability at them at that time. Ping hosts were added to esams when ganeti was installed there, but it seems never rolled out to ulsfo or eqsin. Ping VMs were not set up at drmrs when it was built.

Questions

  1. Do we need ping hosts at all?
    • Has anything in the calculation changed since they were introduced which makes it more/less desirable to have them today?
  2. Do we need ping hosts at every POP?
    • It seems to me if we think they're needed we should have them deployed consistently at all locations?
  3. Should our config be adjusted to also deal with IPv6 echo requests?
    • Assuming people are pinging wikipedia a good chunk of them will end up using v6, so it seems to me if it's beneficial for v4 it would also help with v6?

Creating this task to enable discussion and try to reach consensus on a way forward.

Details

Event Timeline

cmooney triaged this task as Low priority.

to reduce load on LVS hosts

My recollection is that it wasn't really about raw load or PPS at the LVSes. It was that our Linux kernel settings have some tunable icmp ratelimiting built in to avoid certain DDoS or reflection attacks, and that the volume of random echo pings was high enough that it was causing some legitimate ICMPs (like PTB for MTU probing) to be dropped by the shared ratelimiter. So this was to offload the echoes and make room within the ratelimit for the important cases.

All of this (both the behavior of 3rd parties and what those icmp tunables look like in modern kernels and/or our sysctl tuning) may have changed since then, of course.

Reading into the code above and the history more and self-correcting: the ratelimiter doesn't apply to PTB packets, just some other informational packets. Apparently we bumped the ratelimiter first as a short-term mitigation (for all the sites), I guess primarily to avoid what looks like ping loss to our monitoring and/or users, then deployed the ping offloader in some places as well as a better way to deal with it (and I guess at thousands per second, the pps reduction probably is useful, although I don't know to what degree).

Do we have any way to measure it's impact? I had a quick look at available promethues metrics and didn't see much corresponding to icmp (but may have missed them).

My thinking was we could possibly disable the offload in in eqiad or codfw and measure the impact? Certainly DDOS patterns are constantly changing and perhaps there is less ICMP flooding than before?

Right now we seem to be getting away with it at the POPs anyway. Presumably we're not hitting those rate limits or we'd be getting the spurious "host down" alerts we want to avoid.

Lastly we could potentially have some sort of rate-limiting configured on the switch-side for ICMP echo, which was IP-aware and didn't count packets from our own internal systems. A bit of work in that however, the QoS stuff we've been testing doesn't have complex network-side classification, and is only designed to kick in when the overall rate exceeds link speed.

some sort of rate-limiting configured on the switch-side for ICMP echo, which was IP-aware and didn't count packets from our own internal systems

Personally I'd rather we be reliable on external pings (within reason, anyways), otherwise users will assume it's evidence we're down or having problems when we're not, based on their unreliable pings.

https://grafana.wikimedia.org/d/000000513/ping-offload might be a good starting point (might need some updates/tweaking to get the exact data you want, though)

some sort of rate-limiting configured on the switch-side for ICMP echo, which was IP-aware and didn't count packets from our own internal systems

Personally I'd rather we be reliable on external pings (within reason, anyways), otherwise users will assume it's evidence we're down or having problems when we're not, based on their unreliable pings.

Thanks @BBlack, that does make sense.

Do we have any way to measure it's impact? I had a quick look at available promethues metrics and didn't see much corresponding to icmp (but may have missed them).

My thinking was we could possibly disable the offload in in eqiad or codfw and measure the impact? Certainly DDOS patterns are constantly changing and perhaps there is less ICMP flooding than before?

If I remember correctly, the initial drive for the ping* hosts wasn't trouble caused by the day-to-day, but some Google cloud image or service started to use "ping wikipedia" as an ongoing health or connectivy check. That caused notable impact to us, but eventually we managed to reach someone at Google to fix that. But such a situation might reappear any time eben though we don't currently see it in the metrics.

Change 964521 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/alerts@master] netops: remove pingoffload alert from esams

https://gerrit.wikimedia.org/r/964521

Change 964521 merged by Filippo Giunchedi:

[operations/alerts@master] netops: remove pingoffload alert from esams

https://gerrit.wikimedia.org/r/964521

Thanks for the task and feedback. If the issue is abuse from a limited number of providers (like in T163312: lvs2001: intermittent packet loss from Icinga checks it seems better to filter out that kind of traffic in an incident response kind of way than maintaining permanent infrastructure for it.
esams however have a constant "background noise" of ICMP an order of magnitude higher (and kind of always had), that's why the other POPs didn't have any ping offload VMs. That said, it doesn't seem to be causing any issue probably thanks to the Kernel tuning done previously.

Based on that and the fact that we will later on have a new LB system, I'd be more inclined to decommission them everywhere than provision new ones.

Closing this task as afaik we haven't seen any issue in esams, and the proper path forward is tracked in T367973: Replace ping offload servers with eBPF.
Please re-open if you disagree.