Page MenuHomePhabricator

cluebotng: IRC freenode activity causes flood warnings, but there is a way to stop that
Closed, ResolvedPublic

Description

It was reported by a freenode staff @Jesopo on IRC:

11:48 <jess> does anyone know much about CBNGRelay
11:50 <jess> there's a bot on freenode coming out of nat.openstack.eqiad1.wikimediacloud.org
11:50 <Majavah> that sounds like something related to ClueBot NG
11:50 <jess> and it's creating flood warnings
11:50 <jess> (a LOT of flood warnings...)
11:50 <arturo> oh, that could make sense
11:51 <jess> there's a way to stop it making those flood warnings without making the bot stop what it's doing, but i think i'll need to talk to whoever runs it

Potential fix:

11:57 <jess> the info is: that bot will stop making flood warnings on freenode if it has voice (+v) in the channel it's talking in. the channel already is configred to give voice to users logged in as CBNGRelay, but the bot is currently CBNGRelay1 and isn't logged in as CBNGRelay

Tool maintainers: @Cobi, @DamianZaremba, @RichSmith.

Event Timeline

aborrero updated the task description. (Show Details)
aborrero moved this task from Inbox to Watching on the cloud-services-team (Kanban) board.
taavi raised the priority of this task from High to Needs Triage.Feb 16 2021, 10:59 AM
taavi triaged this task as High priority.
taavi updated the task description. (Show Details)

also worth considering whether this relay is at all necessary; its channels appear mostly abandoned.

The bot is configured to identify and get voice to avoid this issue, but it appears that at some point it timed out and on re-connect didn't, so ended up not voiced.

I restarted it, which dropped it back to the right username

16 Feb 11:39:53 - GOT NOTICE from "NickServ": "You are now identified for ^BCBNGRelay^B."
16 Feb 11:39:53 - MODE: #wikipedia-en-cbngfeed sets mode: +v
16 Feb 11:39:53 - MODE: #wikipedia-en-cbngdebug sets mode: +v
16 Feb 11:39:53 - MODE: #wikipedia-en-cbngrevertfeed sets mode: +v

Effectively this was only being consumed by STiki, so could indeed likely be turned off, as we did previously with the REDIS relay.

if it's sending

PRIVMSG NickServ :IDENTIFY <passsword>

you might want to change that to

PRIVMSG NickServ :IDENTIFY CBNGRelay <password>

otherwise it will fail to identify if its current nickname isn't CBNGRelay.

think it might be best to just turn it off if it isn't being used though. awful lot of data transfer for something not needed.

If I'm not mistaken, Huggle still uses it as well...

if it's sending

PRIVMSG NickServ :IDENTIFY <passsword>

you might want to change that to

PRIVMSG NickServ :IDENTIFY CBNGRelay <password>

otherwise it will fail to identify if its current nickname isn't CBNGRelay.

think it might be best to just turn it off if it isn't being used though. awful lot of data transfer for something not needed.

This has been done regardless of what is decided to do with the relay.

bot still failing to identify frequently

also noticed that the bot reconnects a lot, which is probably not helping it's identifying issues. how come?

it's been solidly making flood warnings for hours this time around

Mentioned in SAL (#wikimedia-cloud) [2021-07-23T16:40:16Z] <majavah> stop cbng_relay grid job, still having issues with irc connection - T274871

This was turned off for a while, but apparently got turned back on again without actually being fixed.

The previous implementation has ben replaced to be better behaved amount sending PRIVMSG's before it's properly logged into the server & is now using SASL to authenticate during the earliest possible phase which should avoid any ip related issues.

Currently it's been working without issue for around 3 hours, I suggest this can be closed later today.

DamianZaremba claimed this task.

In the past 2 days the bot got killed 5 times for excess flood.

Looking at the message rate overall, it was only slightly higher compared to the surrounding minutes e.g.

44: 67
45: 59
46: 59
47: 60
48: 90 < disconnected here
49: 81
50: 66
51: 76
52: 61

or

10: 73
11: 66
12: 72
13: 52
14: 108 < disconnected here
15: 58
16: 80
17: 64
18: 76
19: 91
20: 91

However there have been other times with higher rates that did not result in a disconnection e.g.

01: 113
04: 100
14: 108

Looking at the IRCD source code, this limit is based on the number of lines in the client related queue, which I assume to also be influenced by the overall usage of the IRC server.

If required I can look at implementing some form of token bucket rate limiting/more filtering/multiple clients, however I think 2-3 disconnects per day is not that dire (new grid process is spawned in a couple of min).