Page MenuHomePhabricator

Restbase: traffic to 3050/udp dropped by iptables
Open, Needs TriagePublic

Description

See https://logstash.wikimedia.org/goto/801b7b7a10de0c2d356775b806b2e5a9

On April 2nd, restbase hosts started seeing a large increase of discarded UDP traffic from/to port 3050/udp, almost exclusively from restbase2022.

It's not urgent, but:
1/ It might be the sign of a miss-configuration or larger issue
2/ It floods the logs :)

Please investigate it and either stop/re-configure what is sending the packets being dropped, or update Ferm to permit them.

Thanks!

Event Timeline

Port 3050 is the restbase ratelimiter and has been opened in ferm for TCP only. I was concerned this might be an issue related to restbase2022 being a newly added host but this issue is present on older hosts (restbase2019, restbase1021)
It looks like restbase only listens on UDP port 3050 on _all_ restbase nodes though - I'm curious whether there was a code change that triggered this (doesn't look like it - the limitation library in github hasn't been updated since 2018) or if access to the limiter service has been broken for a much longer time.

Snippet from restbase config.yaml:

ratelimiter:
  type: kademlia
  listen:
    address: 10.192.32.190
    port: 3050
root@restbase2022:~# netstat -tunlp| grep 3050
udp        0      0 10.192.32.190:3050      0.0.0.0:*                           110397/nodejs
root@restbase2022:~# ps axu | grep 110397
restbase 110397  3.3  0.0 1250724 42172 ?       Sl   Apr02 278:01 /usr/bin/nodejs restbase/server.js -c /etc/restbase/config.yaml

Happy to open this port on UDP if necessary but I'd like some insight on why we haven't noticed this until now or if I'm misunderstanding.

It's possible it has not been noticed. RESTBase rate-limiting is done per-host, and then per-host counters are distributed across all the hosts over UDP via a distributed hash table. So, if the counters were not syncing, we'd see rate-limiting still working, just being less accurate, which might have easily been overlooked.

Which begs a question - do we even need this feature then if nobody noticed it's not working for quite some time? These packages bring a lot of dependencies, and given that the implementation is a fork and quite non-understandable, maybe it's time to retire limitation and stick to per-host rate-limiting in restbase? Anything requiring fancier limiting can deploy behind api-gateway.

From the outside it seems to me like we don't need this feature - given how long it has been non-functional I would almost be scared to enable it as it might cause an unfamiliar series of effects and changes to how rate limiting is done.

Aklapper removed a subscriber: Pchelolo.

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

@hnowlan Is this something that still needs to happen and if yes, who would own the next step?

I think it's still a relevant question, if we can get to this work before RESTbase deprecation takes hold. Rather than open the port I think it makes sense to test disabling the service. I think the person to own next steps is me.