Page MenuHomePhabricator

Rethink anti-flooding protections
Closed, ResolvedPublicFeature

Description

With multiple input feeds (gerrit, phorge, hopefully gitlab soon) and potentially multiple output messages per input because of diversity of irc channels and audiences, I think we need to rethink the anti-flood protections currently in use by the bot.

The basic protection applied today started from T112032: wikibugs - throttle output, don't get kicked for flooding. The decision there was to throttle the rate at which new events are pushed into the redis queue. This makes the message producer wait as the mechanism for implementing the delay. This delay however is per-producer, or rather per wikibugs2.rqueue.RedisQueue instance, so adding a new producer or a new connection to the backing Redis server increases the potential outbound message rate.

Since 2024-03-03 we have a ZNC instance between the irc bot and libera.chat. This instance is running with default FloodBurst and FloodRate settings which should limit the total output rate for lines headed towards libera.chat. Is this enough protection in a practical sense? If not what can we do to tune our irc3.IrcBot and/or the rate at which we take from the queue to defend against a flooding potential?

Event Timeline

bd808 triaged this task as High priority.Mar 9 2024, 7:05 PM
bd808 added subscribers: TheresNoTime, valhallasw, Dzahn and 2 others.

I am working to make the gerrit producer async which also means moving away from wikibugs2.rqueue.RedisQueue. The past discussions of flooding I have seen are related to Phorge bulk edit actions triggering floods, so maybe this is not an immediate cause for concern? The bouncer really should absorb bursts for us now and trickle them out to libera.chat in a nicer way than producer input throttling does anyway.

Pinging @Legoktm, @valhallasw, @TheresNoTime, @greg, and @Dzahn into this thread for their thoughts as folks who have tried to help keep the bot from getting in trouble for line rate violations in the past.

I think indeed the current system was primarily a "well this seems good enough" solution rather than taking a principled approach.

I don't remember whether the current approach is a 'buffer' or 'drop' - there is value in just giving up at some point (allowing more recent events to be emitted rather than old ones). At the very least it makes more sense to handle this on the *irc* end of the RedisQueue rather than the individual producers.

I don't remember whether the current approach is a 'buffer' or 'drop' - there is value in just giving up at some point (allowing more recent events to be emitted rather than old ones).

The current delay mechanism is to block on RedisQueue.put, so it is functionally a buffering approach. Additional events will not be read from the producer's origin until the put returns.

At the very least it makes more sense to handle this on the *irc* end of the RedisQueue rather than the individual producers.

This would be my inclination as well if we decide we still need a to ratelimit outbound from the python side. I just built an AsyncRedisQueue for the gerrit task changes I'm making to use. When the irc task is updated to use AsyncRedisQueue in the future we could look at adding a read delay to AsyncRedisQueue.get or in the bot right after it returns.

Alternately, irc3.IrcBot has flood_burst, flood_rate, and flood_rate_delay config knobs we can try turning to control the irc message production rate at the python outbound edge. The defaults look to be basically the same as the ZNC config defaults: flood_burst=4, flood_rate=1, flood_rate_delay=1. flood_burst here is the max number of lines to allow through before considering adding delay on each loop through the pending messages. After sending the available 1..flood_burst messages in a batch the bot checks to see if there are still pending messages in the queue. If there are it will sleep (flood_rate_delay / flood_rate) seconds, send the next message, and restart the loop. With the default settings then it seems that the bot should sustain at most a rate of 5 messages per 4 seconds: 4 messages + 4 second sleep + 1 message & repeat.

I just did an accidental test of flood protections using only the irc3 bot and znc limits from the tools.wikibugs-testing deployment. The irc bot there had lost connectivity with redis overnight. When I restarted the bot it pumped 72 messages towards the libera.chat servers. Those messages arrived at my client watching the ##wikibugs2 testing channel over a 4 minute interval from 15:24 to 15:28. That's approximately 18 messages per minute, or one every 3.33 seconds as a sustained rate.

bd808 claimed this task.

The bot has been running for a couple of weeks with only the default flood protections from irc3.IrcBot + ZNC mentioned in T359753#9618035. There have not been any issues in this time which I think reasonably shows that this is working. If we need to tune ZNC or irc3.IrcBot in the future as more message sources and sinks are added we can do so as desired.