Ideally, the API gateway would only handle requestscoming from the outside. However, it looks like it is receiving a large number of internal requests (>500 req/sec) from internal IPs. We are seeing requests primarily from the 172.16 and 10.192 ranges. We should find the source of these requests and change them to use the service mesh instead, if possible.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T399291 Epic: API Rate Limiting Architecture | |||
| Open | None | T412585 Epic: Enforce API rate limits (WE5.1.3c) | |||
| Open | daniel | T398919 Epic: API rate limiting dry run (WE5.1.3b) | |||
| Open | None | T410198 Determine the source of internal requests going through the API gateway. | |||
| Declined | None | T410273 api rate limiting: Assign ratelimit class based on IP range | |||
| Open | Jgiannelos | T411769 Migrate wikifeeds backend calls away from rest-gateway | |||
| Open | None | T411771 Migrate PageViewInfo calls away from rest-gateway | |||
| Invalid | None | T411770 Migrate GrowthExperiments away from rest-gateway |
Event Timeline
Could you supply some of these IP addresses for investigation? My gut feeling is that these are going to be health checks. Is this the api-gateway or the rest-gateway?
I haven't checked the API gateway. Here are the top IPs for the REST gateway (naive sort to find dupes, not by rec/sec):
10.192.12.30 10.192.14.10 10.192.29.14 10.192.36.6 10.192.4.20 10.192.40.11 10.192.41.6 10.192.5.30 10.192.8.26 172.16.1.41 172.16.19.172 172.16.4.75
From 172.16.19.172 alone we see 40 req/sec. That seems a lot for health checks...
Also I'd assume that most health checks shouldn't go through the gateway, right?
All of the 10.192 addresses are wikikube workers, which would correspond to either health checks or something else in wikikube making direct calls (which we actively discourage). The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely
The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely
These are tools and bots running on Toolforge and Cloud VPS VMs. You 'll also see edits from this IP range.
According to the rest-gateway logs, both MediaWiki itself and Wikifeeds are making direct calls to rest-gateway for the page-analytics_pageviews route.
wikifeeds is doing direct rest-gateway calls for a couple of functions in pageviews.js:
- https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/wikifeeds/+/refs/heads/master/lib/pageviews.js#61
- https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/wikifeeds/+/refs/heads/master/lib/pageviews.js#75
MediaWiki probably has a similar call somewhere.
This should be changed to direct mw-api-int calls, shouldn't it?
In GrowthExperiments (hardcoded) and PageViewInfo (uses $wgPageViewInfoWikimediaEndpoint).
Tagging:
- Content-Transform-Team for Wikifeeds
- Growth-Team for GrowthExperiments
- PageViewInfo for PageViewInfo
If these teams/projects could please weigh in on feasibility, I can create subtasks for each of them. It would be really great to get this squared away.
Seems like these cases should be changed to query the page-analytics service directly on https://page-analytics.discovery.wmnet:30443/metrics/pageviews/[...]
@Clement_Goubert shared the following snipped with me:
350 │ service_cluster_cidr: 351 │ v4: "10.192.72.0/24" 352 │ v6: "2620:0:860:306::/116" 353 │ cluster_cidr: 354 │ v4: "10.194.128.0/17" 355 │ v6: "2620:0:860:cabe::/64"
If I understand correctly, "service_cluster_cidr" means "pods", and "cluster_cidr" means "nodes". So the requests are coming from pods... is that consistent with the idea that these are health probes?
service_cluster_cidr is the IP space allocated to Kubernetes Services. It's an internal to Kubernetes IP range, it's improbable that you are indeed seeing these as source addresses.
cluster_cidr is indeed pod IP range. Those are the IPs that workloads hosted in WikiKube. Those could show up as source addresses, but since you are talking about 10.192.X.X above and that range is 10.194.128.0/17 it's not this.
From T410198#11378640 I see:
10.192.12.30 10.192.14.10 10.192.29.14 10.192.36.6 10.192.4.20 10.192.40.11 10.192.41.6 10.192.5.30 10.192.8.26
None of these IP belong in either 10.192.72.0/24 or 10.194.128.0/17, so they are unrelated.
However, the IP address listed previous are indeed wikikube-worker IPs, so node IPs. That is consistent indeed with health checks (which are originating on purpose from the IP address of the host of the pods).
Thank you for clarifying!
Is ther a way to distinguish the following (by address or headers):
- requests from workers
- requests directly from other pods in the cluster
- requests from the mesh
- requests coming in through the CDN
- other (whatever that may be?)
It depends on your end goal. Do you want to remove them so that they don't mess with your data in a statistical manner? Or to be absolutely certain ?
For example, regarding healthchecks
- Requests from nodes/workers will be for a well defined endpoint in a readinessProbe. The definitions will be in the chart of the workload, possibly overriden in helmfile.d/services. For the API gateway this is /healthz per https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/services/api-gateway/values.yaml#L395.
Requests from the outside can make it to this endpoint, but it's improbable they will, so statistically speaking you are probably ok
- Or, amend the httpGet definition of the readinessProbe allows to add headers. You can add a unique header to this section and rely solely on that one.
- Or rely on https://github.com/wikimedia/operations-puppet/blob/production/modules/network/data/data.yaml#L76 and grab just the private1-.-(eqiad|codfw) stanzas to cross check that the IPs are indeed internal (this tell you nothing more though about the originator)
Or a combination of the above, depending on how sure you want to be. Statistically speaking, unless something weird happens, any of the above, even in isolation, should be sufficient.
- requests directly from other pods in the cluster
Yes. The cluster_cidr IP range from above fits perfectly your purpose.
- requests from the mesh
I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.
I 'll also point out that part of the task is about having internal workloads talk to API Gateway (via the mesh or directly, doesn't matter). I 've always been against such a change. The API Gateway should be an entry point from the outside, not to be used internally. That's what the mesh is for (and what Clement is pointing out above for some workloads).
- requests coming in through the CDN
Headers. They will have x-forwarded-for set and more importantly X-Client-IP.
- other (whatever that may be?)
Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.
Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.
I 'll also point out that part of the task is about having internal workloads talk to API Gateway (via the mesh or directly, doesn't matter). I 've always been against such a change. The API Gateway should be an entry point from the outside, not to be used internally. That's what the mesh is for (and what Clement is pointing out above for some workloads).
I agree. The plan is to identify that kind of caller and make them use the mesh to talk to the respective backend service.
If it turns out we need a gateway between internal clients and API gateway, it should be a separate instance with different configuration.
Ideally i should be certain enough to excempt certain requests from rate limiting. It's doesn't have to be 100%, but shouldn't be exploitable from the outside.
- Requests from nodes/workers will be for a well defined endpoint in a readinessProbe. The definitions will be in the chart of the workload, possibly overriden in helmfile.d/services. For the API gateway this is /healthz
Right... I can filter out health checks for the gateway itself easily enough I suppose. I was thinking of k8s health checks in general, but they wouldn't go through the gateway - why would they? Silly me!
- requests directly from other pods in the cluster
Yes. The cluster_cidr IP range from above fits perfectly your purpose.
Ideally, there shouldn't be any such requests, right? All requests to the gateway (with the exception of health checks and prometheus scapes) should come from the CDN? But we do see a lot of them...
I should probably check whether they come from nodes directly, or from nodes but via the CDN - that is, am I looking at the actual client address, or at x-client-ip. Currently these two sources of information are conflated.
- requests from the mesh
I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.
Thank you for checking. I was hoping this to be the case, but I wasn't sure.
- requests coming in through the CDN
Headers. They will have x-forwarded-for set and more importantly X-Client-IP.
Right. I'll see how I can me better use of that information.
- other (whatever that may be?)
Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.
Hm... this is a problem... I don't see an easy or efficient way to do that in Envoy with Lua.
Would it be possible to include a magic header in these requests? One that cannot come from the outside?
Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.
I'll need to figure out how to do proper cdir matching in Lua I guess. Nother bit of trouble...
/healthz is so fast (it's designed to be) that I would argue excluding it entirely from rate limiting, regardless of originating point, would be OK. Should someone decide to abuse that endpoint, we 'll block him as an abuser, not rate limit them.
- Requests from nodes/workers will be for a well defined endpoint in a readinessProbe. The definitions will be in the chart of the workload, possibly overriden in helmfile.d/services. For the API gateway this is /healthz
Right... I can filter out health checks for the gateway itself easily enough I suppose. I was thinking of k8s health checks in general, but they wouldn't go through the gateway - why would they? Silly me!
Exactly, health checks go directly to the workloads, never through any kind of proxy/gateway.
- requests directly from other pods in the cluster
Yes. The cluster_cidr IP range from above fits perfectly your purpose.
Ideally, there shouldn't be any such requests, right? All requests to the gateway (with the exception of health checks and prometheus scapes) should come from the CDN? But we do see a lot of them...
To the API gateway? The one powering exclusively api.wikimedia.org? Yes, no requests should show from the pods. If you do see some, there's some misconfiguration and we need to fix it. For the REST Gateway, I see Clément has already added 3 tags for the items mentioned in T410198#11387937, so this is already identified and being chased down.
Down the line, we can enforce this if you want at the network layer, disallowing other pods from being able to reach the api-gateway and rest-gateway). Might sounds overkill, but it will save us from similar misconfigurations in the future.
I should probably check whether they come from nodes directly, or from nodes but via the CDN - that is, am I looking at the actual client address, or at x-client-ip. Currently these two sources of information are conflated.
- requests from the mesh
I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.
Thank you for checking. I was hoping this to be the case, but I wasn't sure.
- requests coming in through the CDN
Headers. They will have x-forwarded-for set and more importantly X-Client-IP.
Right. I'll see how I can me better use of that information.
- other (whatever that may be?)
Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.
Hm... this is a problem... I don't see an easy or efficient way to do that in Envoy with Lua.
Oh, I hadn't realized you want to do it in the path of the request. Yeah, certainly don't do a reverse DNS lookup during serving a request.
Would it be possible to include a magic header in these requests? One that cannot come from the outside?
Revisiting this with a fresh mind, I was wrong. LVS hosts don't do HTTP fetches anymore for Kubernetes workloads. They only keep an idle TCP connection open, without doing an HTTP request. This is configured identically for api-gateway and rest-gateway
What however WILL do /healthz HTTP checks, and is defined in api-gateway and rest-gateway is Prometheus.
It looks like we can amend the configuration a bit and send some mutually agreed upon HTTP header with a shared (between prometheus and gateway) secret value so that you can identify those.
However, this is /healthz we are talking about. It's probably cheaper and faster to just exclude the entire endpoint from rate limiting as I pointed out above.
Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.
I'll need to figure out how to do proper cdir matching in Lua I guess. Nother bit of trouble...
https://github.com/Humbedooh/lua-cidr/blob/main/cidr.lua looks pretty simple code wise and with a single require, which is bundled in lua.
I tested it quickly in a container and it appears to work. Caveat being that the documentation in the comment at the top of the file disagrees with the signature of the cidr.network function, which takes a self argument that the comment doesn't somehow pass. The tests (which do pass a 1) appear to agree, not sure why they decided to implement that self parameter though.
As far as I can tell from logstash, calls identified as internal (that get no ratelimit_key) are definitely from Wikifeeds and MediaWiki itself, and all to either page-analytics or device-analytics. device-analytics calls seem to also originate from PageViewInfo