Page MenuHomePhabricator

restbase heavily alerts about hitting a HTTP 429 ratelimit for internal IP
Open, LowPublic

Description

Restbase started hitting ratelimits as a result of clients hitting the transform endpoint:

2021-02-28.log:19:40 <+icinga-wm> PROBLEM - restbase endpoints health on restbase1029 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html/{title} (Transform wikitext to html) is CRITICAL: Test Transform wikitext to html returned the unexpected status 429 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase

We can see the spikes in log messages in the 7 day view here https://logstash.wikimedia.org/goto/d40f82dc85d6053c4483fcce357e9236

The volume of log messages caused issues with logstash around the same time:

17:58 <+icinga-wm> PROBLEM - Logstash Elasticsearch indexing errors #o11y on alert1001 is CRITICAL: 65.36 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/3283cc1372b7df18f26128163125cf45 https://grafana.wikimedia.org/dashboard/db/logstash

When we look at the message from restbase, there is some additional weirdness - x-client-ip is for the IP of the restbase instance itself, which is possibly related to how the serviceproxy behaves: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf

There's a further weirdness in this issue from the fact that the messages contain differing wikis in the root_req.uri and api_path - I have no idea what that might be related to.

Event Timeline

The RESTbase change we spoke of has been merged but still needs to be deployed: https://github.com/wikimedia/restbase/pull/1288

BPirkle moved this task from Inbox to Tracking/Watching on the Platform Engineering board.
BPirkle subscribed.

@hnowlan , are you planning to deploy this yourself, or does something else need to be done?

When we look at the message from restbase, there is some additional weirdness - x-client-ip is for the IP of the restbase instance itself, which is possibly related to how the serviceproxy behaves: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf

Note that we are tackling this in T246348 for mediawiki, we could possibly use the exact same solution for restbase.

When we look at the message from restbase, there is some additional weirdness - x-client-ip is for the IP of the restbase instance itself, which is possibly related to how the serviceproxy behaves: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf

Note that we are tackling this in T246348 for mediawiki, we could possibly use the exact same solution for restbase.

No wait, now that I am looking into this again, this isn't the services proxy, scratch the above comment.

This is restbase connecting to itself. The services proxy doesn't listen on ipv6 yet and the address in the log above is ::ffff:10.64.0.100 which is a ipv4 mapped IPv6 address.