Page MenuHomePhabricator

Freenode sometimes throttles bot connections from tools
Open, LowPublic

Description

Right now we have public IPs assigned to some but not all exec nodes. Freenode loves bots running on the nodes with public IPs, and rejects most/all bots that connect from the nodes w/out public IPs.

In the short run we just need public IPs on all exec nodes. In the long run we should get freenode to lift that throttle for us if possible.

Event Timeline

Andrew created this task.Nov 27 2016, 7:19 AM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptNov 27 2016, 7:19 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Az1568 added a subscriber: Az1568.Nov 28 2016, 4:02 AM

One of the WMFGCs for IRC:

<AlexZ> If someone tells me what IPs or range need a whitelist I can email the folks who deal with that at freenode.

We're going to fix it both ways. I added floating IPs to the remaining 10 exec nodes. I've also emailed ilines@freenode.net to ask for a lift on the connection limit. In most cases, traffic from Labs comes from a single IP: 208.80.155.255

In a perfect world we would ask IRC to lift the throttle from all public Labs IPs as well, but that might be a big ask. The exact set of public ips assigned to only tools exec nodes is hard to predict.

With most of the IPs in the labs public /25 it's at least possible to determine which underlying instance is the source. I would expect the .255 NAT IP to be hard to get whitelisted.

Andrew changed the task status from Open to Stalled.Dec 20 2016, 3:26 PM
Andrew removed Andrew as the assignee of this task.

No response from freenode

scfc triaged this task as Low priority.Feb 16 2017, 9:30 PM
scfc moved this task from Triage to Backlog on the Toolforge board.

@Andrew I guess there is a response now?

Nope, I never heard anything back.

after speaking with a staffer today there is no issue adding an iline but the box needs to ensure an ident daemon is running for so each individual user with access has a unique identity for them or their bots. if staff see refusals they will easily up the limit for the host, but with a workaround in place they aren't likely to see such

Luke081515 changed the task status from Stalled to Open.Jul 10 2017, 10:36 PM
bd808 added a subscriber: bd808.Jul 24 2017, 7:22 PM

after speaking with a staffer today there is no issue adding an iline but the box needs to ensure an ident daemon is running for so each individual user with access has a unique identity for them or their bots. if staff see refusals they will easily up the limit for the host, but with a workaround in place they aren't likely to see such

We had some chats about this task in irc today after looking at public IP usage generally in Cloud Services related to our Neutron SDN networking plans. It looks like there may be a couple of identd services that are NAT aware:

In theory, running either of these services on our outbound NAT host(s) and all of the grid engine exec nodes would allow an ident request to the NAT'ed ip to find its way to the appropriate grid node to determine the tool account that is actually opening the irc connection.

I am more familiar with oident but I believe either would be suitable

if connecting clients start connecting without the ~ in the username field it's working perfectly and hopefully we shouldn't see any further connection errors. if so we can push freenode again for that iline.

So is this still an ongoing issue for anyone?

Figured it's been a year since the last response here, so I'd give it a poke and see!

bd808 added a comment.Oct 5 2018, 10:53 PM

So is this still an ongoing issue for anyone?

Figured it's been a year since the last response here, so I'd give it a poke and see!

We have not had any new complaints that I am aware of, but we do still have the public IPv4 addresses in-place on the grid exec nodes from T151704#2827832 which I believe largely solved the problem by spreading the irc bots run from Toolforge across a larger pool of IPv4 addresses as seen by Freenode. We have not done any work towards the NAT aware identd service idea which would in theory let us re-apply for the iline change and remove the public IPv4 usage across the grid. This may be something that we try to work on in the coming months as a preparation step for other changes that we will be making to Toolforge that include migrating the exec nodes to a new network.

T216370: IP address list for grid nodes / Freenode iline request has put a bandaid over this problem for now, but I'm going to work on getting oidentd setup such that a public service is running on the network gateway nodes that handle our public IPs and clients are running on all of the Toolforge grid engine nodes. This should make it easier to discuss and adjust iline limits with Freenode staff/admins.

bd808 claimed this task.Feb 26 2019, 6:25 PM

Change 493767 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add profiles for oidentd proxy and client modes

https://gerrit.wikimedia.org/r/493767

Change 493767 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add profiles for oidentd proxy and client modes

https://gerrit.wikimedia.org/r/493767

Change 493767 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] wmcs: Add profiles for oidentd proxy and client modes

https://gerrit.wikimedia.org/r/493767

Mentioned in SAL (#wikimedia-cloud) [2019-04-11T12:01:44Z] <arturo> T151704 deploying oidentd

T216370: IP address list for grid nodes / Freenode iline request has put a bandaid over this problem for now, but I'm going to work on getting oidentd setup such that a public service is running on the network gateway nodes that handle our public IPs and clients are running on all of the Toolforge grid engine nodes. This should make it easier to discuss and adjust iline limits with Freenode staff/admins.

@Az1568 we have deployed oidentd in proxy mode for the Toolforge job grid nodes. Can you check and see if Freenode can properly get ident lookup responses now?

after speaking with a staffer today there is no issue adding an iline but the box needs to ensure an ident daemon is running for so each individual user with access has a unique identity for them or their bots. if staff see refusals they will easily up the limit for the host, but with a workaround in place they aren't likely to see such

@charitwo can you check with Freenode staff to see if they are getting proper ident responses from Toolforge irc bots now? They should be seeing something like tools.stashbot as the response for connections from the ~stashbot@wikimedia/bot/stashbot userhost for example.

1559347684 00:08:04 [card] -!- stashbot [~stashbot@wikimedia/bot/stashbot]

the ~ means no response

bd808 added a comment.Sat, Jun 1, 12:13 AM

1559347684 00:08:04 [card] -!- stashbot [~stashbot@wikimedia/bot/stashbot]

the ~ means no response

I thought the ~ just meant that the response does not match the registered account name. Hmm... I'll see what debugging I can do to find out where things are breaking down.

the IRC account name on freenode doesn't matter, the ssh user is what must match that field